I am extremely thrilled to announce that RGoogleAnalytics was released recently by CRAN. R is already a swiss army knife for data analysis largely due its 6000 libraries. What this means is that digital analysts can now fully use the analytical capabilities of R to fully explore their Google Analytics Data. In this post, we will go through the basics of RGoogleAnalytics. Let’s begin
Fire up your favorite R IDE and install RGoogleAnalytics. Installation is pretty basic. In case, you are new to RGoogleAnalytics, refer this post to learn how to install it.
Since RGoogleAnalytics uses the Google Analytics Core Reporting API under the hood, every request to the API has to be authorized under the OAuth2.0 protocol. This requires an initial setup in terms of registering an app with the Google Analytics API so that you get a unique set of project credentials (Client ID and Client Secret). Here’s how to do this -
- Navigate to Google Developers Console
- Create a New Project and Open it
- Navigate to APIs and ensure that the Analytics API is turned On for your project
- Navigate to Credentials and create a New Client ID
- Select Application Type – Installed Application
- Once your Client ID and Client Secret are created, copy them to your R Script.
require(RGoogleAnalytics) # Authorize the Google Analytics account # This need not be executed in every session once the token object is created # and saved client.id <- "xxxxxxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com" client.secret <- "xxxxxxxxxxxxxxxd_TknUI" token <- Auth(client.id,client.secret) # Save the token object for future sessions save(token,file="./token_file")
The next step is to get the Profile ID/View ID of the Google Analytics profile for which the data extraction is to be carried out. It can be found within the Admin Panel of the Google Analytics UI. This profile ID maps to the table.id argument below.
The code below generates a query with the Standard Query Parameters – Start Date, End Date, Dimensions, Metrics etc. and hits the query to the Google Analytics API. The API response is converted in the form of a R DataFrame.
# Get the Sessions & Transactions for each Source/Medium sorted in # descending order by the Transactions query.list <- Init(start.date = "2014-08-01", end.date = "2014-09-01", dimensions = "ga:sourceMedium", metrics = "ga:sessions,ga:transactions", max.results = 10000, sort = "-ga:transactions", table.id = "ga:123456") # Create the Query Builder object so that the query parameters are validated ga.query <- QueryBuilder(query.list) # Extract the data and store it in a data-frame ga.data <- GetReportData(ga.query, token) # Sanity Check for column names dimnames(ga.data) # Check the size of the API Response dim(ga.data)
In future sessions, you need not generate the Access Token every time. Assumming that you have saved it to a file, it can be loaded via the following snippet -
load("./token_file") # Validate and refresh the token ValidateToken(token)
Here are a few practices that you might find useful -
- Before querying for a set of dimensions and metrics, you might want to check whether they are compatible. This can be done using the Dimensions and Metrics Explorer
- The Query Feed Explorer lets you try out different queries in the browser and you can then copy the query parameters to your R Script. It can be found here. I have found this to be a huge time-saver for debugging failed queries
- In case if the API returns an error, here’s a guide to understanding the cryptic error responses.
Did you find RGoogleAnalytics useful? Please leave your comments below. In case if you have a feature request or want to file a bug please use this link.