ERDDAP ERDDAPY: Difference between revisions

From gfi
No edit summary
No edit summary
Line 78: Line 78:
* access to the list of datasets by type (grid, tabular,..)
* access to the list of datasets by type (grid, tabular,..)


== access to the list of all datasets available through this ERDDAP server ==
=== access to the list of all datasets available through this ERDDAP server ===


Here we use the '''get_search_url''' method
Here we use the '''get_search_url''' method
Line 116: Line 116:




== access to the list of datasets by type (grid, tabular,..) ==
=== access to the list of datasets by type (grid, tabular,..) ===


Here we use the '''get_search_url''' method, we also specify the '''response''' and '''protocol''' attributes in our ERDDAP instance
Here we use the '''get_search_url''' method, we also specify the '''response''' and '''protocol''' attributes in our ERDDAP instance
Line 251: Line 251:
|-
|-
| 4 || NaN || https://erddap.icos-cp.eu/erddap/tabledap/icos... || icos26na20170511SocatEnhanced
| 4 || NaN || https://erddap.icos-cp.eu/erddap/tabledap/icos... || icos26na20170511SocatEnhanced
|}
=== How to get variable name given attribute value ===
erddapy comes with a method to select variables by theirs attributes values named '''get_var_by_attr'''
# with the first result of the search
e.dataset_id=df['Dataset ID'].values[1]
# get variable by theirs attributes values
variable_list = e.get_var_by_attr(standard_name="surface_air_pressure")
variable_list
<blockquote>
['NCEP_SLP']
</blockquote>
=== How to use Advanced search ===
Here again we use the '''get_search_url''' method
; response : specifies the type of table data file that you want to download (default ''html'').
; protocol : choose between ''tabledap'' or ''griddap''.
; search_for : “Google-like” search of the datasets’ metadata.
But we can also add, through a dictionary, other constraints on the search area, time span, and on one or several categories.
# show datasets selected by advanced search
e.constraints = {
    "standard_name": "surface_air_pressure",
    "max_lon": -69.0,
    "max_lat": 41.0,
    "min_time": "2016-07-10T00:00:00Z",
    "max_time": "2016-08-10T00:00:00Z"
}
url = e.get_search_url(response="html")
<nowiki>print(f'{len(set(df["tabledap"].dropna()))} matching tabledap datasets')
df[['griddap','tabledap','Dataset ID']].head()</nowiki>
<blockquote>
108 matching tabledap datasets
{| class="wikitable"
|-
!  !! griddap !! tabledap !! Dataset ID
|-
| 0 || NaN || https://erddap.icos-cp.eu/erddap/tabledap/allD... || allDatasets
|-
| 1 || NaN || https://erddap.icos-cp.eu/erddap/tabledap/icos... || icos26na20170409SocatEnhanced
|-
| 2 || NaN || https://erddap.icos-cp.eu/erddap/tabledap/icos... || icos26na20170421SocatEnhanced
|-
| 3 || NaN || https://erddap.icos-cp.eu/erddap/tabledap/icos... || icos26na20170430SocatEnhanced
|-
| 4 || NaN || https://erddap.icos-cp.eu/erddap/tabledap/icos... || icos26na20170511SocatEnhanced
|}
</blockquote>
== How to download data ==
=== How to download data with OPeNDAP ===
e.dataset_id="icos26na20170409SocatEnhanced"
e.constraints = None
opendap_url = e.get_download_url(response="opendap",)
print(opendap_url)
<blockquote>
https://erddap.icos-cp.eu/erddap/tabledap/icos26na20170409SocatEnhanced
</blockquote>
=== How to download data as netCDF4 ===
# with netCDF4
from netCDF4 import Dataset
with Dataset(opendap_url) as nc:
    print(nc.summary)
<blockquote>
The Integrated Carbon Observation System, ICOS, is a European-wide greenhouse gas research infrastructure. ICOS produces standardised data on greenhouse gas concentrations in the atmosphere, as well as on carbon fluxes between the atmosphere, the earth and oceans. This information is being used by scientists as well as by decision makers in predicting and mitigating climate change. The high-quality and open ICOS data is based on the measurements from over 140 stations across 12 European countries.
</blockquote>
=== How to download data as Xarray ===
# with Xarray
e.dataset_id="icos26na20170409SocatEnhanced"
ds = e.to_xarray(decode_times=False)
ds
<blockquote>
: <xarray.Dataset>
: Dimensions:    (row: 5416)
: Coordinates:
:: longitude  (row) float32 -53.82 -53.84 -53.86 -53.88 ... 10.69 10.69 10.7
:: latitude  (row) float32 66.85 66.83 66.81 66.79 ... 57.38 57.37 57.35 57.33
:: time      (row) float64 1.492e+09 1.492e+09 ... 1.492e+09 1.492e+09
: Dimensions without coordinates: row
: Data variables:
:: Expocode  (row) object '26NA20170409' '26NA20170409' ... '26NA20170409'
:: pCO2      (row) float32 nan nan nan nan nan nan ... nan nan nan nan nan nan
:: P_sal      (row) object '' '' '' '' '' '' '' '' ... '' '' '' '' '' '' '' ''
: Attributes: (12/71)
:: acquisition_ended_at_time:              2017-04-16T10:01:19Z
:: acquisition_started_at_time:            2017-04-09T02:58:53Z
:: acquisition_station_class:              1
:: acquisition_station_comment:            The cargo ship M/S Nuka Arctica ...
:: acquisition_station_country_code:        NO
:: acquisition_station_id:                  26NA
:: ...                                      ...
:: subsetVariables:                        Expocode, depth2, version, SOCAT...
:: summary:                                The Integrated Carbon Observatio...
:: time_coverage_end:                      2017-04-16T10:01:19.000Z
:: time_coverage_start:                    2017-04-09T02:58:53.000Z
:: title:                                  26NA20170409_SOCAT_enhanced
:: Westernmost_Easting:                    -54.042
</blockquote>
=== How to download data as Pandas ===
Here we extract only several variables
# with pandas
e.dataset_id="icos26na20170409SocatEnhanced"
e.constraints = None
e.protocol = "tabledap"
e.variables = ['time', 'Expocode', 'pCO2', 'P_sal']
df = e.to_pandas(
    index_col="time (UTC)",
    parse_dates=True,
)
df.head()
{| class="wikitable"
|-
! time (UTC) !! Expocode !! pCO2 (ufffdatm) !! P_sal (psu)
|-
| 2017-04-09 02:58:53+00:00 || 26NA20170409 || NaN || NaN
|-
| 2017-04-09 03:03:22+00:00 || 26NA20170409 || NaN || NaN
|-
| 2017-04-09 03:08:14+00:00 || 26NA20170409 || NaN || NaN
|-
| 2017-04-09 03:12:42+00:00 || 26NA20170409 || NaN || NaN
|-
| 2017-04-09 03:17:10+00:00 || 26NA20170409 || NaN || NaN
|}
|}

Revision as of 14:53, 19 January 2022

How to use erddapy

First of all, we need to instantiate the ERDDAP URL constructor for a server.

server
an ERDDAP server URL or an acronym for one of the builtin servers.
from erddapy import ERDDAP
import pandas as pd
e = ERDDAP(server="https://erddap.bcdc.no/erddap")

To explore the methods and attributes available in the ERDDAP object


[method for method in dir(e) if not method.startswith("_")]


['auth', 'constraints', 'dataset_id', 'get_categorize_url', 'get_download_url', 'get_info_url', 'get_search_url', 'get_var_by_attr', 'protocol', 'relative_constraints', 'requests_kwargs', 'response', 'server', 'server_functions', 'to_iris', 'to_ncCF', 'to_pandas', 'to_xarray', 'variables']

Note: All the methods prefixed with get_ will return a valid ERDDAP URL for the requested response and options.

To get help on method

help(e.get_search_url)
Help on method get_search_url in module erddapy.erddapy:
get_search_url(response: Union[str, NoneType] = None, search_for: Union[str, NoneType] = None, protocol: Union[str, NoneType] = None, items_per_page: int = 1000, page: int = 1, **kwargs) -> str method of erddapy.erddapy.ERDDAP instance
The search URL for the `server` endpoint provided.
Args:
search_for: "Google-like" search of the datasets' metadata.
- Type the words you want to search for, with spaces between the words.
ERDDAP will search for the words separately, not as a phrase.
- To search for a phrase, put double quotes around the phrase (for example, `"wind speed"`).
- To exclude datasets with a specific word, use `-excludedWord`.
- To exclude datasets with a specific phrase, use `-"excluded phrase"`
- Searches are not case-sensitive.
- You can search for any part of a word. For example, searching for `spee` will find datasets with `speed` and datasets with `WindSpeed`
- The last word in a phrase may be a partial word. For example, to find datasets from a specific website (usually the start of the datasetID), include (for example) `"datasetID=erd"` in your search.
response: default is HTML.
items_per_page: how many items per page in the return, default is 1000.
page: which page to display, default is the first page (1).
kwargs: extra search constraints based on metadata and/or coordinates ke/value.
metadata: `cdm_data_type`, `institution`, `ioos_category`, `keywords`, `long_name`, `standard_name`, and `variableName`.
coordinates: `minLon`, `maxLon`, `minLat`, `maxLat`, `minTime`, and `maxTime`.
Returns:
url: the search URL.

Then ERDDAP's users can:

  • access to the list of all datasets available through this ERDDAP server
  • access to the list of datasets by type (grid, tabular,..)

access to the list of all datasets available through this ERDDAP server

Here we use the get_search_url method

# show all datasets
url = e.get_search_url()
print(url)

https://erddap.icos-cp.eu/erddap/search/advanced.html?page=1&itemsPerPage=1000&protocol=(ANY)&cdm_data_type=(ANY)&institution=(ANY)&ioos_category=(ANY)&keywords=(ANY)&long_name=(ANY)&standard_name=(ANY)&variableName=(ANY)&minLon=(ANY)&maxLon=(ANY)&minLat=(ANY)&maxLat=(ANY)

we also specify the response attribute in our ERDDAP instance.

response
specifies the type of table data file that you want to download (default html). There are many response available, see the docs for griddap and tabledap respectively.
# show all datasets
e.response='csv'
url = e.get_search_url(search_for="all")
df = pd.read_csv(url)
df[['griddap','tabledap','Dataset ID']].head()
griddap tabledap Dataset ID
0 NaN https://erddap.icos-cp.eu/erddap/tabledap/allD... allDatasets
1 NaN https://erddap.icos-cp.eu/erddap/tabledap/icos... icos26na20170409SocatEnhanced
2 NaN https://erddap.icos-cp.eu/erddap/tabledap/icos... icos26na20170421SocatEnhanced
3 NaN https://erddap.icos-cp.eu/erddap/tabledap/icos... icos26na20170430SocatEnhanced
4 NaN https://erddap.icos-cp.eu/erddap/tabledap/icos... icos26na20170511SocatEnhanced


access to the list of datasets by type (grid, tabular,..)

Here we use the get_search_url method, we also specify the response and protocol attributes in our ERDDAP instance

response
specifies the type of table data file that you want to download (default html).
protocol
choose between tabledap or griddap.
# show datasets by type
e.response='csv'
e.protocol='tabledap'
url = e.get_search_url()
df = pd.read_csv(url)
df[['griddap','tabledap','Dataset ID']].head()
griddap tabledap Dataset ID
0 NaN https://erddap.icos-cp.eu/erddap/tabledap/allD... allDatasets
1 NaN https://erddap.icos-cp.eu/erddap/tabledap/icos... icos26na20170409SocatEnhanced
2 NaN https://erddap.icos-cp.eu/erddap/tabledap/icos... icos26na20170421SocatEnhanced
3 NaN https://erddap.icos-cp.eu/erddap/tabledap/icos... icos26na20170430SocatEnhanced
4 NaN https://erddap.icos-cp.eu/erddap/tabledap/icos... icos26na20170511SocatEnhanced

But as user you probably don't want to use all datasets and you surely don't want to look in all of them to find which ones have the data you are interesting in.

How to search datasets

ERDDAP's users can select datasets:

  • Full text search (Google-like search of the datasets' metadata)
  • Category search
  • Advanced search

How to use Full text search

Here we use the get_search_url method, we also specify the `response`, and `protocol` attributes in our ERDDAP instance.

response
specifies the type of table data file that you want to download (default html).
protocol
choose between tabledap or griddap.
search_for
“Google-like” search of the datasets’ metadata.
  • Type the words you want to search for, with spaces between the words. ERDDAP will search for the words separately, not as a phrase.
  • To search for a phrase, put double quotes around the phrase (for example, "wind speed").
  • To exclude datasets with a specific word, use -excludedWord .
  • To exclude datasets with a specific phrase, use -"excluded phrase" .
  • Don't use AND between search terms. It is implied. The results will include only the datasets that have all of the specified words and phrases (and none of the excluded words and phrases) in the dataset's metadata (data about the dataset).
  • Searches are not case-sensitive.
  • To search for specific attribute values, use attName=attValue .
  • To find just grid or just table datasets, include protocol=griddap or protocol=tabledap in your search.
  • This ERDDAP is using searchEngine=original.
  • In this ERDDAP, you can search for any part of a word. For example, searching for spee will find datasets with speed and datasets with WindSpeed.
  • In this ERDDAP, the last word in a phrase may be a partial word. For  example, to find datasets from a specific website (usually the start of the datasetID), include (for example) "datasetID=erd" in your search.
# show datasets selected by full text search
e.response='csv'
e.protocol='tabledap'
url = e.get_search_url(search_for='fCO2')
df = pd.read_csv(url)
print(f'{len(set(df["tabledap"].dropna()))} matching tabledap datasets')
df[['griddap','tabledap','Dataset ID']].head()

119 matching tabledap datasets

griddap tabledap Dataset ID
0 NaN https://erddap.icos-cp.eu/erddap/tabledap/allD... allDatasets
1 NaN https://erddap.icos-cp.eu/erddap/tabledap/icos... icos26na20170409SocatEnhanced
2 NaN https://erddap.icos-cp.eu/erddap/tabledap/icos... icos26na20170421SocatEnhanced
3 NaN https://erddap.icos-cp.eu/erddap/tabledap/icos... icos26na20170430SocatEnhanced
4 NaN https://erddap.icos-cp.eu/erddap/tabledap/icos... icos26na20170511SocatEnhanced

How to get info on metadata

erddapy come with a method to explore dataset's metadata named get_info_url

# get metadata information
e.response='csv'
e.dataset_id=df['Dataset ID'].values[1]
info_url = e.get_info_url()
info = pd.read_csv(info_url)
info.head(6)
Row Type Variable Name Attribute Name Data Type Value
0 attribute NC_GLOBAL acquisition_ended_at_time String 2017-04-16T14:21:09Z
1 attribute NC_GLOBAL acquisition_started_at_time String 2017-04-10T14:01:01Z
2 attribute NC_GLOBAL acquisition_station_class String 1
3 attribute NC_GLOBAL acquisition_station_comment String The research vessel (R/V) G.O. Sars is own and...
4 attribute NC_GLOBAL acquisition_station_country_code String NO
5 attribute NC_GLOBAL acquisition_station_id String 58G2

How to use Category search

Here we use the get_categorize_url method

categorize_by
a valid attribute, e.g.: ioos_category or standard_name
value
an attribute value.x
# show datasets selected by category search
e.response='csv'
url = e.get_categorize_url(categorize_by='standard_name', value='surface_air_pressure')
df = pd.read_csv(url)
print(f'{len(set(df["tabledap"].dropna()))} matching tabledap datasets')
df[['griddap','tabledap','Dataset ID']].head()
griddap tabledap Dataset ID
0 NaN https://erddap.icos-cp.eu/erddap/tabledap/allD... allDatasets
1 NaN https://erddap.icos-cp.eu/erddap/tabledap/icos... icos26na20170409SocatEnhanced
2 NaN https://erddap.icos-cp.eu/erddap/tabledap/icos... icos26na20170421SocatEnhanced
3 NaN https://erddap.icos-cp.eu/erddap/tabledap/icos... icos26na20170430SocatEnhanced
4 NaN https://erddap.icos-cp.eu/erddap/tabledap/icos... icos26na20170511SocatEnhanced

How to get variable name given attribute value

erddapy comes with a method to select variables by theirs attributes values named get_var_by_attr

# with the first result of the search
e.dataset_id=df['Dataset ID'].values[1]
# get variable by theirs attributes values
variable_list = e.get_var_by_attr(standard_name="surface_air_pressure")
variable_list

['NCEP_SLP']

How to use Advanced search

Here again we use the get_search_url method

response
specifies the type of table data file that you want to download (default html).
protocol
choose between tabledap or griddap.
search_for
“Google-like” search of the datasets’ metadata.

But we can also add, through a dictionary, other constraints on the search area, time span, and on one or several categories.

# show datasets selected by advanced search
e.constraints = {
   "standard_name": "surface_air_pressure",
   "max_lon": -69.0,
   "max_lat": 41.0,
   "min_time": "2016-07-10T00:00:00Z",
   "max_time": "2016-08-10T00:00:00Z"
}
url = e.get_search_url(response="html")
print(f'{len(set(df["tabledap"].dropna()))} matching tabledap datasets')
df[['griddap','tabledap','Dataset ID']].head()

108 matching tabledap datasets

griddap tabledap Dataset ID
0 NaN https://erddap.icos-cp.eu/erddap/tabledap/allD... allDatasets
1 NaN https://erddap.icos-cp.eu/erddap/tabledap/icos... icos26na20170409SocatEnhanced
2 NaN https://erddap.icos-cp.eu/erddap/tabledap/icos... icos26na20170421SocatEnhanced
3 NaN https://erddap.icos-cp.eu/erddap/tabledap/icos... icos26na20170430SocatEnhanced
4 NaN https://erddap.icos-cp.eu/erddap/tabledap/icos... icos26na20170511SocatEnhanced

How to download data

How to download data with OPeNDAP

e.dataset_id="icos26na20170409SocatEnhanced"
e.constraints = None
opendap_url = e.get_download_url(response="opendap",)
print(opendap_url)

https://erddap.icos-cp.eu/erddap/tabledap/icos26na20170409SocatEnhanced

How to download data as netCDF4

# with netCDF4 
from netCDF4 import Dataset
with Dataset(opendap_url) as nc:
   print(nc.summary)

The Integrated Carbon Observation System, ICOS, is a European-wide greenhouse gas research infrastructure. ICOS produces standardised data on greenhouse gas concentrations in the atmosphere, as well as on carbon fluxes between the atmosphere, the earth and oceans. This information is being used by scientists as well as by decision makers in predicting and mitigating climate change. The high-quality and open ICOS data is based on the measurements from over 140 stations across 12 European countries.

How to download data as Xarray

# with Xarray
e.dataset_id="icos26na20170409SocatEnhanced"
ds = e.to_xarray(decode_times=False)
ds
<xarray.Dataset>
Dimensions: (row: 5416)
Coordinates:
longitude (row) float32 -53.82 -53.84 -53.86 -53.88 ... 10.69 10.69 10.7
latitude (row) float32 66.85 66.83 66.81 66.79 ... 57.38 57.37 57.35 57.33
time (row) float64 1.492e+09 1.492e+09 ... 1.492e+09 1.492e+09
Dimensions without coordinates: row
Data variables:
Expocode (row) object '26NA20170409' '26NA20170409' ... '26NA20170409'
pCO2 (row) float32 nan nan nan nan nan nan ... nan nan nan nan nan nan
P_sal (row) object ...
Attributes: (12/71)
acquisition_ended_at_time: 2017-04-16T10:01:19Z
acquisition_started_at_time: 2017-04-09T02:58:53Z
acquisition_station_class: 1
acquisition_station_comment: The cargo ship M/S Nuka Arctica ...
acquisition_station_country_code: NO
acquisition_station_id: 26NA
... ...
subsetVariables: Expocode, depth2, version, SOCAT...
summary: The Integrated Carbon Observatio...
time_coverage_end: 2017-04-16T10:01:19.000Z
time_coverage_start: 2017-04-09T02:58:53.000Z
title: 26NA20170409_SOCAT_enhanced
Westernmost_Easting: -54.042

How to download data as Pandas

Here we extract only several variables

# with pandas
e.dataset_id="icos26na20170409SocatEnhanced"
e.constraints = None
e.protocol = "tabledap"
e.variables = ['time', 'Expocode', 'pCO2', 'P_sal']
df = e.to_pandas(
   index_col="time (UTC)",
   parse_dates=True,
)
df.head()
time (UTC) Expocode pCO2 (ufffdatm) P_sal (psu)
2017-04-09 02:58:53+00:00 26NA20170409 NaN NaN
2017-04-09 03:03:22+00:00 26NA20170409 NaN NaN
2017-04-09 03:08:14+00:00 26NA20170409 NaN NaN
2017-04-09 03:12:42+00:00 26NA20170409 NaN NaN
2017-04-09 03:17:10+00:00 26NA20170409 NaN NaN