OPeNDAP

From Wikipedia, the free encyclopedia

OPeNDAP is an acronym for "Open-source Project for a Network Data Access Protocol," an endeavor focused on enhancing the retrieval of remote, structured data through a Web-based architecture and a discipline-neutral Data Access Protocol (DAP). Widely used, especially in Earth science, the protocol is layered on HTTP, and its current specification is DAP4,[1] though the previous DAP2 version remains broadly used. Developed and advanced (openly and collaboratively) by the non-profit OPeNDAP, Inc.,[2] DAP is intended to enable remote, selective data-retrieval as an easily invoked Web service. OPeNDAP, Inc. also develops and maintains zero-cost (reference) implementations of the DAP protocol in both server-side and client-side software.

"OPeNDAP" often is used in place of "DAP" to denote the protocol but also may refer to an entire DAP-based data-retrieval architecture. Other DAP-centered architectures, such as THREDDS[3] and ERDDAP, the NOAA GEO-IDE UAF ERDDAP[4] exhibit significant interoperability with one another as well as with systems employing OPeNDAP's own (open-source) servers and software.

A DAP client can be an ordinary browser or even a spreadsheet, though with limited functionality (see OPeNDAP's Web page on Available Client Software). More typically, DAP clients are:

  • Data-analysis or data-visualization tools (such as MATLAB, IDL, Panoply, GrADS, Integrated Data Viewer, Ferret and ncBrowse[5]) which their authors have adapted to enable DAP-based data input;
  • Similarly adapted Web applications (such as Dapper Data Viewer, aka DChart)[6]
  • Similarly adapted end-user programs (in common languages)

Regardless of their types, and whether developed commercially or by an end-user, clients almost universally link to DAP servers through libraries that implement the DAP2 or DAP4 protocol in one language or another. OPeNDAP offers open-source libraries in C++ and Java, but many clients rely on community developed libraries such as PyDAP or, especially, the NetCDF suite. Developed and maintained by the Unidata Program at the UCAR in multiple programming languages, all NetCDF libraries include embedded capabilities for retrieving (array-style) data from DAP servers.

A data-using client references a data set by its URL and requests metadata or content by issuing (usually through an embedded DAP library) an HTTP request to a DAP server. Content requests usually are preceded by requests for metadata describing the structure and other details about the referenced data set. With this information, the client may construct DAP constraint expressions[7] to retrieve specific content (i.e., subsets) from the source. OPeNDAP servers offer various types of responses, depending on the specific form of the client's request, including XML, JSON, HTML and ASCII. In response to requests for content, OPeNDAP servers can respond with multi-part mime documents that include a binary portion with NetCDF or DAP-native encoding. (These binary forms offer compact means to deliver large volumes of content, and the DAP-native form may even be streamed if desired.)

OPeNDAP's software for building DAP servers (on top of Apache) is dubbed Hyrax and includes adapters that facilitate serving a wide variety of source data. DAP servers most frequently enable (remote) access to (large) HDF or NetCDF files, but the source data can exist in databases or other formats, including user-defined ones. When source data are organized as files, DAP retrievals enable, via subsetting, finer-grained access than does the FTP. Furthermore, OPeNDAP servers can aggregate subsets from multiple files for delivery in a single retrieval. Taken together, subsetting, aggregation and streaming can yield substantial data-access efficiencies, even in the presence of slow networks.

OPeNDAP and other DAP servers are used operationally in government agencies, including NASA and NOAA, for providing access to Earth science data, including satellite imagery and other high-volume information sources. The DAP data model embraces a comprehensive set of data structures, including multidimensional arrays and nested sequences (i.e., records), complemented by a correspondingly rich set of constraint expressions. Hence the OPeNDAP data-retrieval architecture has demonstrated utility across a broad range of scientific data types, including data generated via simulations and data generated via observations (whether remotely sensed or measured in situ).

References[edit]

External links[edit]