MailChimp Export API; Working with JSON Data

Daniel Liden

2019/02/23

MailChimp Export API

I’m in the process of setting up regular exports of data from MailChimp to help Pocket Lobby keep track of user engagement. We want to integrate the MailChimp data with a number of other data sources, so getting the data out of MailChimp and into a format that can be linked with our other data sources is important. Ideally, we can automate as much of this as possible (rather than manually downloading the CSV files and adding them to a database). To that end, I spent a little bit of time getting up to speed with MailChimp’s export API, which allows for quick access to data about MailChimp lists (users) and campaigns (emails; including engagement information such as clicks and opens).

From the MailChimp export API documentation, “request data is passed to the API by POSTing JSON objects to the API endpoints with the appropriate parameters.” What does this mean? POST is an HTTP request method, meaning that, broadly speaking, it is a method that specifies that a particular action is to be taken on a particular resource. POST methods specifically request that a web server accept a certain set of data enclosed in the POST request. It is often used to tell the server to store the data.

In our case, the data we’re sending with the POST request is a set of parameters specifying the data we want to export. The parameters are sent as JSON objects. The API requests of interest are:

To export data about a particular user list:

curl --request POST \
--url 'https://<dc>.api.mailchimp.com/export/1.0/list/' \
--header 'content-type: application\json' \
--data '{"apikey": "<api-key>", "id": "<list-id>"}'

Where

To export data about an email campaign:

curl --request POST \
--url 'https://<dc>.api.mailchimp.com/export/1.0/campaignSubscriberActivity/' \
--header 'content-type: application\json' \
--data '{"apikey": "<api-key>", "id": "<campaign-id>"}'

Where

Of course, this approach to finding the campaign IDs is very time consuming will ultimately make this part of the data extraction process not worthwhile – I’m hoping (and assuming) there is a better way.

Further details about the parameters one can pass to the POST request can be found here. If we send the request as written above using curl, the output will be printed to the terminal. It’s more useful to redirect the output to a file using > – that is,

curl --request POST \
--url 'https://<dc>.api.mailchimp.com/export/1.0/campaignSubscriberActivity/' \
--header 'content-type: application\json' \
--data '{"apikey": "<api-key>", "id": "<campaign-id>"}' > data.json

That brings us to part 2 of this post.

Working with JSON data

The MailChimp export data are downloaded as JSON objects. What is JSON and how can we use it? JSON stands for JavaScript Object Notation. It is a text format for data that is specifically meant to be used as a “data-interchange language;” i.e., it is made to allow for data transfer between different programs and programming languages.

We won’t go into details about the syntax of JSON or the process of making JSON objects. A good tutorial can be found here. We’re interested in making these data usable and accessible by whatever language(s) we’re using the parse the data. In my case, I want to use JSON files with R and Python.

Using JSON data in R

We will be using the jsonlite R package. Documentation is available here. To begin with, I am going to convert one of the base R included datasets, mtcars, into JSON. this gives us an example to work with. We will just consider the first five rows.

library(jsonlite)
jsoncars = toJSON(mtcars[1:5,], pretty = TRUE)
jsoncars

[
  {
    "mpg": 21,
    "cyl": 6,
    "disp": 160,
    "hp": 110,
    "drat": 3.9,
    "wt": 2.62,
    "qsec": 16.46,
    "vs": 0,
    "am": 1,
    "gear": 4,
    "carb": 4,
    "_row": "Mazda RX4"
  },
  {
    "mpg": 21,
    "cyl": 6,
    "disp": 160,
    "hp": 110,
    "drat": 3.9,
    "wt": 2.875,
    "qsec": 17.02,
    "vs": 0,
    "am": 1,
    "gear": 4,
    "carb": 4,
    "_row": "Mazda RX4 Wag"
  },
  {
    "mpg": 22.8,
    "cyl": 4,
    "disp": 108,
    "hp": 93,
    "drat": 3.85,
    "wt": 2.32,
    "qsec": 18.61,
    "vs": 1,
    "am": 1,
    "gear": 4,
    "carb": 1,
    "_row": "Datsun 710"
  },
  {
    "mpg": 21.4,
    "cyl": 6,
    "disp": 258,
    "hp": 110,
    "drat": 3.08,
    "wt": 3.215,
    "qsec": 19.44,
    "vs": 1,
    "am": 0,
    "gear": 3,
    "carb": 1,
    "_row": "Hornet 4 Drive"
  },
  {
    "mpg": 18.7,
    "cyl": 8,
    "disp": 360,
    "hp": 175,
    "drat": 3.15,
    "wt": 3.44,
    "qsec": 17.02,
    "vs": 0,
    "am": 0,
    "gear": 3,
    "carb": 2,
    "_row": "Hornet Sportabout"
  }
]

Now we have a new object, jsoncars, with class json. Converting this back to a dataframe is just as easy:

dfcars = fromJSON(jsoncars)
dfcars

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2

This approach to convert JSON to data frames should work fairly straightforwardly in most circumstances. With this conversion made, we can proceed as we would with any other data frame.

Using JSON data in Python

Here we will use the JSON functions included in the pandas package. We will reproduce the steps above. We will import the same dataset as above from this repository of R datasets.

import pandas as pd
cars = pd.read_csv('https://vincentarelbundock.github.io/Rdatasets/csv/datasets/mtcars.csv',
index_col = 0)
cars = cars.iloc[0:5]
cars_json = cars.to_json()
print(cars_json)
Python 3.8.1 (default, Dec 21 2019, 20:57:38)
[GCC 9.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/babel-6ODOoq/python-npcT1h", line 1, in <module>
    import pandas as pd
ModuleNotFoundError: No module named 'pandas'
python.el: native completion setup loaded

You’ll notice that the format of the output is quite different. For one thing, there wasn’t a simple “pretty = TRUE” option here as there was in R (though I’m sure the same could be accomplished with a small amount of extra work). There are also quite a few different ways to format the same data as a JSON object. Given that JSON is supposed to be quite good for data interchange, I’m assuming these formats unambiguously reflect a particular data structure.

Anyway, our last step is to convert this python JSON object back to a data frame. This is even easier than before. Note, however, that the output of the above is not an object of type json; it is a string, and it does not have a read_json method, so the syntax is a little different.

cars_df = pd.read_json(cars_json)
print(cars_df)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/babel-6ODOoq/python-v5pxTL", line 1, in <module>
    cars_df = pd.read_json(cars_json)
NameError: name 'pd' is not defined

Aside from some formatting issues resulting from some difficulties between Python and emacs (I’m still trying to figure out how to get clean Pandas data output to show up), we got what we were looking for and we can use this data frame object for whatever analyses or further data transformation and transfer we need.