Google analytics data extraction in R

Unlike other posts on this blog, this particular post is more focused on coding using R so an audience with a developer mindset would like it more than pure business analysts.

My goal is to describe an alternative method to use to extract the data from Google Analytics via API into R. I have been using R for quite some time but I think the GA library for R has been broken and while they did make an update, it’s sort of not being used right now.

Considering this, I thought to write it down by myself on moving on as more data-related operations are now being done using R.

Moreover, the Rgoogleanalytics package that is available is built for Linux only and my Windows friends may just like me to have something for them as well.

Ok so let’s get started, it’s going to be very quick and easy.

There are some prerequisites for GA Data extraction in R:

  1. At least one domain must be registered with your Google analytics account
  2. R installed with the following the Googleng packages

Steps to be followed for Google Analytics data extraction in R :

Set the Google Analytics query parameters for preparing the request URI:

To extract the Google Analytics data, first, you need to define the query parameters like dimensions, metrics, start date, end date, sort, and filters as per your requirement.

# Defining Google analytics search query parameters

# Set the dimensions and metrics
ga_dimensions <- 'ga:visitorType,ga:operatingSystem,ga:country'
ga_matrics <- 'ga:visits,ga:bounces,ga:avgTimeOnSite'

# Set the starting and ending date
startdate <- '2012-01-01'
enddate <- '2012-11-30'

# Set the segment, sort and filters
segment <- 'dynamic::ga:operatingSystem==Android'
sort <- 'ga:visits'
filters <- 'ga:visits>2'
Get the access token from Oauth 2.0 Playground

We will obtain the access token from Oauth 2.0 Playground. Following are the steps for generating the access token.

  1. Go to Oauth 2.0 Playground
  2. Select Analytics API and click on the Authorize APIs button with providing your related account credentials
  3. Generate the access token by clicking on the Exchange authorization code for tokens and set it to the access token variable in the provided R script
Retrieve and select the Profile

From the below, you can retrieve the number of profiles registered under your Google Analytics account. With this, you can have the related GA profile id. Before retrieving profiles ensure that the access token is present.

We can retrieve the profile by requesting to Management API with accesstoken as a parameter, it will return the JSON response. Here are the steps to convert the response to the list and store it in to the data frame object profiles.

# For requesting the GA profiles and store the JSON response in to GA.profiles.Json variable
GA.profiles.Json <- getURL(paste("https://www.googleapis.com/analytics/v3/management/accounts/~all/webproperties/~all/profiles?access_token=",access_token, sep="", collapse = ","))

# To convert resonse variable GA.profiles.Json to list
GA.profiles.List <- fromJSON(GA.profiles.Json, method='C')

# Storing the profile id and name to profile.id and profile.name variable
GA.profiles.param <- t(sapply(GA.profiles.List$items,
                              '[', 1:max(sapply(GA.profiles.List$items, length)))) 
profiles.id <- as.character(GA.profiles.param[, 1])
profiles.name <- as.character(GA.profiles.param[, 7])

# Storing the profile.id and profile.name to profiles data.frame
profiles <- data.frame(id=profiles.id,name=profiles.name)

We have stored the profiles information in profiles data frame with profile id and profile name. We can print the retrieved list by following code

profiles
OUTPUT::

         id       name
1 ga:123456    abc.com
2 ga:234567    xyz.com

At a time we can retrieve the Google analytics data from only one GA profile. so we need to define the profile id for which we want to retrieve the GA data. You can select the related profile id from the above output and store it in to profileid variable to be later used in the code.

# Set your google analytics profile id
profileid <- 'ga:123456'
Retrieving GA data

Requesting the Google Analytics data to Google analytics data feed API with an access token and all of the query parameters defined as dimensions, metrics, start date, end date, sort, and filters.

# Request URI for querying the Google analytics Data
GA.Data <- getURL(paste('https://www.googleapis.com/analytics/v3/data/ga?',
                        'ids=',profileid,
			'&dimensions=',ga_dimensions,
                        '&metrics=',ga_matrics,
			'&start-date=',startdate,
                        '&end-date=',enddate,
                        '&segment=',segment,
			'&sort=',sort,
			'&filters=',filters,
                        '&max-results=',10000,
                        '&start-index=',start_index*10000+1,
                        '&access_token=',accesstoken, sep='', collapse=''))

This request returns a response body with the JSON structure. Therefore to interpret these response values we need to convert it to a list object.

# For converting the Json data to list object GA.list
GA.list <- fromJSON(GA.Data, method='C')

Now its easy to get the response parameters from this list object. So, the total number of the data rows will be obtained by the following command

# For getting the total number of the data rows
totalrow <-  GA.list$totalResults
Storing GA data in a Data frame

Storing the Google Analytics response data in R dataframe object which is more appropriate for data visualization and data modeling in R

# Splitting the ga_matrics to vectors
metrics_vec <- unlist(strsplit(ga_matrics,split=','))

# Splitting the ga_dimensions to vectors
dimension_vec <-unlist(strsplit(ga_dimensions,split=','))

# To splitting the columns name from string object(dimension_vec)
ColnamesDimension <- gsub('ga:','',dimension_vec)

# To splitting the columns name from string object(metrics_vec)
ColnamesMetric <- gsub('ga:','',metrics_vec)

# Combining dimension and metric column names to col_names
col_names <- c(ColnamesDimension,ColnamesMetric)
colnames(finalres) <- col_names

# To convert the object GArows to dataframe type
GA.DF <- as.data.frame(finalres)

Finally the retrieved data is stored in GA.DF dataframe. You can chek it’s top data by the following command

head(GA.DF)
OUTPUT::
        visitorType operatingSystem   country visits bounces      avgTimeOnSite
1       New Visitor         Android Australia      3       1              106.0
2       New Visitor         Android   Belgium      3       1 155.33333333333334
3       New Visitor         Android    Poland      3       0               60.0
4       New Visitor         Android    Serbia      3       2 40.666666666666664
5       New Visitor         Android     Spain      3       1               43.0
6 Returning Visitor         Android (not set)      3       3                0.0

You will need this full R script for trying this yourself, You can download this script by clicking here. Currently, I am working on the development of R package, which will help R users to do the same task with less effort. If anyone among you is interested provide your email id in the comment, and we’ll get in touch.

Would you like to understand the value of predictive analysis when applied to web analytics data to help improve your understanding relationship between different variables? We think you may like to watch our Webinar – How to perform predictive analysis on your web analytics tool data. Watch the Replay now!

, ,
Previous Post
Games Recruitee Play
Next Post
Web Page Sequencing Analysis Using Google Analytics

12 Comments. Leave new

  • Great post.  I was able to get this working in about 30 minutes.  You might be interested in a couple R-code modifications I used that reduce the lines of code needed to process the results.

    Instead of using a custom str_to_vector() function, I used an internal R function called strsplit()
    it returns a list so must be unlisted:

    metrics_vec <- unlist(strsplit(ga_matrics,split=','))

    dimension_vec <-unlist(strsplit(ga_dimensions,split=','))
    Because R objects and methods are almost always vectorized, instead of looping over each element of metrics_vec and calling gsub at each iteration of the loop, I used:

    ColnamesMetric <- gsub('ga:','',metrics_vec)

    ColnamesDimensions <- gsub('ga:','',dimension_vec)I cast the results of do.call() directly to a data.frame,
    GA.DF <- data.frame(do.call(rbind, GA.list$rows))
    colnames(GA.DF) <- c(ColnamesDimension,ColnamesMetric)Thanks,Glynn

    Reply
    • Vignesh Prajapati
      Vignesh Prajapati
      February 22, 2013 4:18 pm

      Hi Glynn, 

      Thanks for trying this for GA data extraction and providing alternative R codes for optimized R operations. I have updated this script based on your suggestion. Keep reading our blogs.!! 

      Reply
  • Avatar
    Sandor Caetano
    February 14, 2013 8:54 pm

    Nice post but I’m having problems with the SSL
    certificate after “GA.profiles.Json <- getURL(paste(…"…

     

    "SSL certificate problem, verify that the CA
    cert is OK"

     

    Do you have any idea on how to circumvent problem?

     

    Reply
  • Avatar
    Sandor Caetano
    February 14, 2013 9:54 pm

    please forget my last post, I was able to make your code work after I downloaded the script.

    Reply
  • help me:
    > install_github(“rga”, “skardhamar”)Installing github repo(s) rga/master from skardhamarInstalling rga.zip from https://github.com/skardhamar/rga/zipballInstalling rgaError: Command failed (1)>

    Reply
    • Vignesh Prajapati
      Vignesh Prajapati
      February 22, 2013 3:59 pm

      Hi Ermelinda,

      I think you are trying to installing rga package from broken link. Try this url https://github.com/skardhamar/rga/archive/master.zip instead of that. 

      Additionally, I would like to recommend you updated r-google-analytics package for GA data extraction in R. Which is available to download at http://code.google.com/p/r-google-analytics/downloads/list.

      Reply
      •  Hi Vignesh:
        > install.packages(“c:\Users\Ermelinda\Desktop\rga-master(1).zip”, repos=NULL, type=”source”
        + )
        Warning in read.dcf(file.path(pkgname, “DESCRIPTION”), c(“Package”, “Type”)) :
          cannot open compressed file ‘rga-master(1)/DESCRIPTION’, probable reason ‘No such file or directory’
        Errore in read.dcf(file.path(pkgname, “DESCRIPTION”), c(“Package”, “Type”)) :
          non posso aprire questa connessione
        Warning messages:
        1: running command ‘C:/REVOLU~1/R-ENTE~2.1/R-214~1.2/bin/x64/R CMD INSTALL -l “C:/Revolution/R-Enterprise-Node-6.1/R-2.14.2/library”   “c:/Users/Ermelinda/Desktop/rga-master(1).zip”‘ had status 1
        2: In install.packages(“c:\Users\Ermelinda\Desktop\rga-master(1).zip”,  :
          installation of package ‘c:/Users/Ermelinda/Desktop/rga-master(1).zip’ had non-zero exit status

        Reply
        • Vignesh Prajapati
          Vignesh Prajapati
          March 21, 2013 10:01 am

          Hi Ermelinda,

          The command you are using to install rga package is not right. If you are using install.packages() then you need to pass the name of the packages which are available to CRAN (do not need to pass the zip file here). Let me provide clear steps to achieve this .

          Steps for rga:
          1. Install devtools package
          2. Load devtools package
          3. Use install_github() method to install rga package from github repository.

          You can get more description at https://github.com/skardhamar/rga#installation

          Again, I would like to suggest you RGoogleAnalytics package which is available for Linux and Windows as well to extract the Google Analytics data in R. Here I have provided steps with related links will help you to explore it very easily.

          Steps for RGoogleAnalytics :
          1. Download o.s. related RGoogleAnalytics packge
          https://code.google.com/p/r-google-analytics/downloads/list
          2. Download and Install Codependency package
          https://code.google.com/p/r-google-analytics/#Before_You_Begin
          3. Install the package via CMD
          – Command : R CMD install RGoogleAnalytics_1.2.zip (for Windows)
          – Command : R CMD install RGoogleAnalytics_1.2.tar.gz (for Linux)
          4. Run sample script
          https://code.google.com/p/r-google-analytics/#Getting_Started
          -http://decisionstats.com/2013/02/17/rgoogleanalytics-package-updated-works-for-oauth-2-0-rstats/

          Reply
      •  Hi
        I still have problems:> install.packages(“c:\Users\Ermelinda\Desktop\rga-master(1).zip”, repos=NULL, type=”source”+ )Warning in read.dcf(file.path(pkgname, “DESCRIPTION”), c(“Package”, “Type”)) :  cannot open compressed file ‘rga-master(1)/DESCRIPTION’, probable reason ‘No such file or directory’Errore in read.dcf(file.path(pkgname, “DESCRIPTION”), c(“Package”, “Type”)) :   non posso aprire questa connessioneWarning messages:1: running command ‘C:/REVOLU~1/R-ENTE~2.1/R-214~1.2/bin/x64/R CMD INSTALL -l “C:/Revolution/R-Enterprise-Node-6.1/R-2.14.2/library”   “c:/Users/Ermelinda/Desktop/rga-master(1).zip”‘ had status 1 2: In install.packages(“c:\Users\Ermelinda\Desktop\rga-master(1).zip”,  :  installation of package ‘c:/Users/Ermelinda/Desktop/rga-master(1).zip’ had non-zero exit status

        Reply
  • help me:
    > install_github(“rga”, “skardhamar”)Installing github repo(s) rga/master from skardhamarInstalling rga.zip from https://github.com/skardhamar/rga/zipballInstalling rgaError: Command failed (1)>

    Reply
  • Avatar
    Saurbh Pawar
    June 26, 2013 5:02 am

    Is there a way to extract visitor level data though R. I mean getting visitor_id level data to do some models to predict visitor level behavior where data across sessions could be combined to form as input for the model,

    Reply
  • Avatar
    Amar Gondaliya
    July 2, 2013 2:25 pm

    Hi Saurabh,

    You can extract visitor level data through R. But you need visitor_id to be stored in Google Analytics. You can store visitor_id using visitor level custom variable (see https://developers.google.com/analytics/devguides/collection/gajs/gaTrackingCustomVariables). Once you store visitor_id, you will be able to retrieve visitor level data through R and can perform analysis you want.
    Regards,
    Amar

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.
You need to agree with the terms to proceed

Menu