How to extract Google Analytics data in R using RGoogleAnalytics

Google Analytics data in R

I am extremely thrilled to announce that RGoogleAnalytics was released recently by CRAN. R is already a swiss army knife for data analysis largely due its 6000 libraries. What this means is that digital analysts can now fully use the analytical capabilities of R to fully explore their Google Analytics Data. In this post, we will go through the basics of RGoogleAnalytics. Let’s begin.

Fire up your favorite R IDE and install RGoogleAnalytics. Installation is pretty basic. In case, you are new to RGoogleAnalytics, refer this page to learn how to install it.

Since RGoogleAnalytics uses the Google Analytics Core Reporting API under the hood, every request to the API has to be authorized under the OAuth2.0 protocol. This requires an initial set up in terms of registering an app with the Google Analytics API so that you get a unique set of project credentials (Client ID and Client Secret). Here’s how to do this –

  • In order to proceed further, you will be asked to Configure consent screen first.
  • After the consent screen configuration, in next step select Application Type – Other and click Create.

credentials

  • The above step will generate OAuth client as below.

OAUTH Client

  • Once your Client ID and Client Secret are created, copy them to your R Script.
  • Enable the Google Analytics API from API Manager
  • Once the project is configured and the credentials set ready, we need to authenticate your Google Analytics Account with your app. This ensures that your app (R Script) can access your Google Analytics data/List your Google Analytics profiles and so on. Once authenticated you get a pair of tokens (Access Token and Refresh Token). An Access Token is appended with each API request so that Google’s servers know that the requests came from your app and they are authentic. Access Tokens expire after 60 minutes so they need to be regenerated using the Refresh Token. I will show you how to do that but prior to that, lets continue the data extraction flow.
require(RGoogleAnalytics)

# Authorize the Google Analytics account
# This need not be executed in every session once the token object is created 
# and saved
client.id <- "xxxxxxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com"
client.secret <- "xxxxxxxxxxxxxxxd_TknUI"
token <- Auth(client.id,client.secret)

# Save the token object for future sessions
save(token,file="./token_file")

The next step is to get the Profile ID/View ID of the Google Analytics profile for which the data extraction is to be carried out. It can be found within the Admin Panel of the Google Analytics UI. This profile ID maps to the table.id argument below.

The code below generates a query with the Standard Query Parameters – Start Date, End Date, Dimensions, Metrics etc. and hits the query to the Google Analytics API. The API response is converted in the form of a R DataFrame.

# Get the Sessions & Transactions for each Source/Medium sorted in 
# descending order by the Transactions

query.list <- Init(start.date = "2014-08-01",
                   end.date = "2014-09-01",
                   dimensions = "ga:sourceMedium",
                   metrics = "ga:sessions,ga:transactions",
                   max.results = 10000,
                   sort = "-ga:transactions",
                   table.id = "ga:123456")

# Create the Query Builder object so that the query parameters are validated
ga.query <- QueryBuilder(query.list)

# Extract the data and store it in a data-frame
ga.data <- GetReportData(ga.query, token)

# Sanity Check for column names
dimnames(ga.data)

# Check the size of the API Response
dim(ga.data)

In future sessions, you need not generate the Access Token every time. Assumming that you have saved it to a file, it can be loaded via the following snippet –

load("./token_file")

# Validate and refresh the token
ValidateToken(token)

Here are a few practices that you might find useful –

  • Before querying for a set of dimensions and metrics, you might want to check whether they are compatible. This can be done using the Dimensions and Metrics Explorer
  • The Query Feed Explorer lets you try out different queries in the browser and you can then copy the query parameters to your R Script. It can be found here. I have found this to be a huge time-saver for debugging failed queries
  • In case if the API returns an error, here’s a guide to understanding the cryptic error responses.

Did you find RGoogleAnalytics useful? Please leave your comments below. In case if you have a feature request or want to file a bug please use this link.

 

Editor’s Note: This blog has been updated on 03/01/2018 for increased accuracy, taking into consideration all the official tech updates.

The following two tabs change content below.
Dikesh Jariwala

Dikesh Jariwala

Dikesh is a software developer and has been a pioneer in building some cool and fun codes at Tatvic. Dikesh loves gaming and has been a champion of Counter Strikes at least at Tatvic.
Previous Post
4 Elements to Tweak to Improve Your Product Recommendation & 6 Best Practices to Follow
Next Post
How to Use Enhanced Ecommerce of Google Analytics to Measure Product Demand?

No results found.

Leave a Reply

Your email address will not be published.

Fill out this field
Fill out this field
Please enter a valid email address.
You need to agree with the terms to proceed

Menu