rbagd.eu

home about contacts

Downloading data with SDMX protocol from R

28 Mar 2015

About SDMX

SDMX stands for Statistical Data and Metadata Exchange. It is becoming a widely used standard for economic and population data providers such as ECB, Eurostat, OECD and many others.

Currently, most widely used SDMX implementation for economic data is known as SDMX-ML̀ where data is exchanged via structured XML documents. All providers I have extensively tested implement SDMX protocol via RESTful API. Queries for data may be of two major types: data itself or description of data (metadata).

 Data

Let's start with data itself. Typically, an SDMX-ML URL for obtaining XML document with data will consist of 4 parts:

To finish up, here's the link to the XML data file for the above Euribor and oil prices. It suffices to concatenate the listed elements separated with a backslash.

Most of the work with SDMX goes into key construction. Unfortunately, there is no single standard of constructing keys so that frequency or seasonal adjustment dimensions may appear in the beginning or at the end of key depending on the data provider, or even the dataset. Here's why it's useful to be able to study metadata for each dataset first.

Metadata

M in SDMX stands for metadata and thus additional information on series can also be exchanged in a structured manner via Data Structure Documents (DSD). DSD queries can be much simpler and require only two elements: a prefix and a dataset. I list here below a sample link to a DSD structure of one of the datasets for each of the 3 providers. You can easily infer the query structure.

Let's take an example. This link requests an XML document from OECD describing data in Government debt dataset. Analyzing this structure allows to construct more accurate data queries.

In particular, we are interested in two elements: CodeLists and Dimension. Dimension element explains in what order and how key for queries is constructed while CodeLists provides possible values for query elements. In this particular case, Dimension elements indicates that a key is constructed as follows: country, debt type, frequency, unit and variable. Thus, taking a look at CodeLists element suggests that a possible key would be FIN+BEL.AMT+GRS+NET.A.USD.3 which requests total Finnish and Belgian government debt flows and stocks in US Dollars at annual frequency. Here's the full link to the XML file. It turns out that only AMT (outstanding amounts) data is provided.

 SDMX and R

I cannot help but mention R package rsdmx which works especially well with above listed institutions and is meant to become an SDMX-ML format abstraction library. I suggest to get it directly from Github instead of CRAN. Lead maintainer and developer of rsdmx, Emmanuel Blondel, is very responsive and package is routinely improved. Kudos to him. Its usage is extremely simple. Just construct or find a valid SDMX query, whether for data or data structure, run readSDMX() from within R and in case your object is of appropriate SDMX class, as.data.frame method for that object is available. With some work, key construction and data download can be fully automated from R.

To finish up, here's a small illustration in R to obtain a piece of data and optionally convert it to a timestamped dataframe.

library(rsdmx)
url <- "http://stats.oecd.org/restsdmx/sdmx.ashx/GetData/GOV_DEBT/FIN+BEL.AMT.A.USD.3"
xml_data <- readSDMX(url)
data <- as.data.frame(xml_data)

library(reshape2)
dcast(data, obsTime ~ COU+DTYP, value.var="obsValue")

Dealing with public data has never been as fun. SDMX is a great protocol for easily constructing and maintaining public data you are interested in. No more random Excel or csv files. Amen.