Currently, most widely used
SDMX implementation for economic data is known as
SDMX-ML̀ where data is exchanged via structured
XML documents. All providers I have extensively tested implement
SDMX protocol via
RESTful API. Queries for data may be of two major types: data itself or description of data (metadata).
Let’s start with data itself. Typically, an
SDMX-ML URL for obtaining
XML document with data will consist of 4 parts:
- prefix link to the data warehouse of the data provider. Some examples would be:
- name of a dataset (commonly referred to as dataflow). Here are two of many possible examples for each of the above providers:
demo_pjan(population on 1 January),
teicp000(harmonized price indices)
ULC_EEQ(unit labour costs),
I don’t know any other way of discovering datasets other than browsing through the website of data provider. ECB has a link with all dataflows, however I couldn’t locate anything similar for other institutions.
- key which allows to identify a particular series within a dataset. Usually, it includes frequency of series, sector, country or type if available, indicates whether series are seasonally adjusted and so on. Let’s analyze a single example:
This is a valid key to query
FM(financial markets) dataset from ECB data warehouse. First, the key specifies that we want monthly data with
U2stands for Euro area code while
EURstands for euro currency. Three sequential dots are actually wildcards that will allow multiple series selection. For Euribor rates, these three dots
...can be replaced by
.RT.MM.(Reuters as source agency, money market). In case of oil prices, it’s
CYstands for commodity and
4Ffor another data provider, though I am unable to identify which one. Obviously,
EURIBOR3MD_+OILBRNIindicates that we would like to download Euribor rates and Brent oil prices. Finally,
HSTAspecifies that queried observation is a historical close, average of observations through period. By using wildcards as shown, i.e. skipping a part of the key while keeping the dot, one can select all possibilities in that part of the query, thus one could also download the whole dataset. Key selection supports
ORoperator, so that
M+Dwould select all monthly and daily obervations.
- additional parameters such as detail level or start/end period (options depend on the implementation). Usually, one can specify
some_key/?startPeriod=2000&endPeriod=2005. Here, some additional flexibility may be allowed, for example downloading only newly updated or latest data. One should consult data provider’s documentation for extra details.
To finish up, here’s the link to the
XML data file for the above Euribor and oil prices. It suffices to concatenate the listed elements separated with a backslash.
Most of the work with
SDMX goes into key construction. Unfortunately, there is no single standard of constructing keys so that frequency or seasonal adjustment dimensions may appear in the beginning or at the end of key depending on the data provider, or even the dataset. Here’s why it’s useful to be able to study metadata for each dataset first.
SDMX stands for metadata and thus additional information on series can also be exchanged in a structured manner via
Data Structure Documents (DSD). DSD queries can be much simpler and require only two elements: a prefix and a dataset. I list here below a sample link to a DSD structure of one of the datasets for each of the 3 providers. You can easily infer the query structure.
Let’s take an example. This link requests an
XML document from OECD describing data in Government debt dataset. Analyzing this structure allows to construct more accurate data queries.
In particular, we are interested in two elements:
Dimension element explains in what order and how key for queries is constructed while
CodeLists provides possible values for query elements. In this particular case,
Dimension elements indicates that a key is constructed as follows: country, debt type, frequency, unit and variable. Thus, taking a look at
CodeLists element suggests that a possible key would be
FIN+BEL.AMT+GRS+NET.A.USD.3 which requests total Finnish and Belgian government debt flows and stocks in US Dollars at annual frequency. Here’s the full link to the
XML file. It turns out that only
AMT (outstanding amounts) data is provided.
##### SDMX and
I cannot help but mention
rsdmx which works especially well with above listed institutions and is meant to become an
SDMX-ML format abstraction library. I suggest to get it directly from Github instead of CRAN. Lead maintainer and developer of
rsdmx, Emmanuel Blondel, is very responsive and package is routinely improved. Kudos to him. Its usage is extremely simple. Just construct or find a valid
SDMX query, whether for data or data structure, run
readSDMX() from within
R and in case your object is of appropriate
as.data.frame method for that object is available. With some work, key construction and data download can be fully automated from
To finish up, here’s a small illustration in
R to obtain a piece of data and optionally convert it to a timestamped dataframe.
library(rsdmx) url <- "http://stats.oecd.org/restsdmx/sdmx.ashx/GetData/GOV_DEBT/FIN+BEL.AMT.A.USD.3" xml_data <- readSDMX(url) data <- as.data.frame(xml_data) library(reshape2) dcast(data, obsTime ~ COU+DTYP, value.var="obsValue")
Dealing with public data has never been as fun.
SDMX is a great protocol for easily constructing and maintaining public data you are interested in. No more random
csv files. Amen.