PHP code to check existence of Google Analytics on a webpage and extract UA-ID

Integration of Google Analytics (GA) code to a website is very easy. But what if you want to know whether a page has Google Analytics implemented or not?

One should have a clear understanding about how a Google analytics code looks like if it is implemented on any webpage.  Below is an example of GA script on www.liftsuggest.com

snapshot_ga

If you will observe few more pages, you will come to know that there are few common word phrases in GA’s script and it is obvious.   We will take advantage of it and presence of such words on a webpage will let us know the existence of Google Analytics. We will consider some of these words and they are,

  • ga.js
  • _trackPageview
  • UA-ID

If all these three words are present on a webpage, we can say that GA is implemented.

Why do we need to check whether GA is implemented or not?

Let’s think of a scenario. In most of the cases, if GA is implemented on 1 or 2 pages of a website, then the owner is aware about it.   But what if a website has many pages (more than hundreds or thousands) and they all must have GA implemented? How to ensure that every page is having GA implementation script? It is very important to get proper analytics data.

To track UA-ID and to know existence of GA for a single page, we have to follow some steps technically. Overview of such steps is as below.

  1. Set a url for which we have to extract Google Analytic’s UA-ID
  2. Grab content of the url using CURL
  3. Extract all the script tags of the content using regular expression (Because GA script is always implemented between <script> and </script> tags.)
  4.  Check for ga.js and _trackPageview in all <script> tag
  5. Extract UA-ID using regular expression
  6. Check whether all 3 word phrases are present or not.

From the overview, you will come to know that Knowing presence of GA is nothing but grabbing content of a webpage and firing some regular expression on the content. However, which regular expression should be used is very important.

How to do this using PHP code?

Code which follows previously mentioned steps is as below.

Create one PHP file and name it Ga_track.php. Write following code.

<?php

class Ga_track
{
    function get_ga_implemented($url)
    {
    $options = array(
        CURLOPT_RETURNTRANSFER => TRUE, // return web page
        CURLOPT_HEADER => TRUE, // don't return headers
        CURLOPT_ENCODING => "", // handle all encodings
        CURLOPT_USERAGENT => "Mozilla/5.0 (Windows NT 6.1; WOW64)", // who am i
        CURLOPT_SSL_VERIFYHOST => FALSE, //ssl verify host
        CURLOPT_SSL_VERIFYPEER => FALSE, //ssl verify peer
        CURLOPT_NOBODY => FALSE,
    );

    $ch = curl_init($url);
    curl_setopt_array($ch, $options);

        //2> Grab content of the url using CURL
    $content = curl_exec($ch); 
   
    $flag1_trackpage = false; //FLag for the phrase '_trackPageview'
    $flag2_ga_js = false; //FLag for the phrase 'ga.js'

    // Script Regex
    $script_regex = "/<script\b[^>]*>([\s\S]*?)<\/script>/i"; 

    // UA_ID Regex
    $ua_regex = "/UA-[0-9]{5,}-[0-9]{1,}/";

    // Preg Match for Script
    //3> Extract all the script tags of the content    
        preg_match_all($script_regex, $content, $inside_script); 
    //4> Check for ga.js, analytics.js, gtag() and _trackPageview in all <script> tag
    try{
        for ($i = 0; $i < count($inside_script[0]); $i++)
    {                               
        if ((stristr($inside_script[0][$i], "ga.js") || stristr($inside_script[0][$i], "function gtag()") || stristr($inside_script[0][$i], "analytics.js")))
            $flag2_ga_js = TRUE;
            
        if (stristr($inside_script[0][$i],"_trackPageview")||stristr($inside_script[0][$i],"gtag('js', new Date())")        
        || stristr($inside_script[0][$i], "ga('send', 'pageview')"))
            $flag1_trackpage = TRUE;
            
    }
    }catch(Exception $e) {
        echo 'Message: ' .$e->getMessage();
        print "error in identifying the snippet";
      }

    // Preg Match for UA ID
    //5> Extract UA-ID using regular expression
    try {
        preg_match_all($ua_regex, $content, $ua_id);
        //6> Check whether all 3 word phrases are present or not.       
        if ($flag2_ga_js && $flag1_trackpage && count($ua_id > 0))
            return($ua_id);
        else
            return(NULL);
    } catch(Exception $e) {
        echo 'Message: ' .$e->getMessage();
        print "error in pregmatch";
      }
    }

}

$ga_obj = new Ga_track();
//1> Set a url for which we have to extract  UA-ID
$url = "url for which we have to extract ID "; 


//===========Block 2===========//
/*You can also make array here from database as below,
set_time_limit(0);
$urls=array();
$con = mysql_connect("localhost","username","password");
if (!$con)
  {
  die('Could not connect: ' . mysql_error());
  }

mysql_select_db("database", $con);

$result = mysql_query("SELECT url_field FROM table");

while($row = mysql_fetch_array($result))
  {
  $urls[]=$row['url_field'];
  }
mysql_close($con);

foreach ($urls as $url)
{
    Copy block 1 here.
}
*/
//===========Block 2 over===========//

//===========Block 1===========//
try{
    $ua_id = $ga_obj->get_ga_implemented($url); //Call to a function to extract details
}catch(Exception $e) {
    echo 'Message: ' .$e->getMessage();
    print "error in the function to extract details";
  }
print_r($ua_id);
if ($ua_id == NULL)
{
    echo "<br/>Google Analytics is not implemented";
}
else
{
    echo "<pre>";
    print_r($ua_id);
    echo "</pre>";
    echo "<br/>Google Analytics is implemented.";
}
//===========Block 1 over===========//

?>
If you have multiple pages, you can store them in database or in an array and use previously mentioned steps in a looping manner by extracting each page from database. (Block 2)

You can suggest more cases where we can use this solution or Please feel free to reach out to me if you have faced such case and want solution related to this.

 

Nirzari Shah

nirzari

Nizari Shah is a technical analyst at Tatvic.
, , ,
Previous Post
How to do Shopping Cart Analysis for E-commerce Websites?
Next Post
Regression with Google Prediction API

6 Comments. Leave new

  • Thanks for this. Does this need to have a dump of URL’s or can it ramdomly follow links on a website and come up with the missing pages?

    Reply
    • Hi,

      Thank you for your comment.

      You need to provide dump of URL’s either from database or by an array.
      When you provide one URL, the program checks whether GA is implemented on that page or not.
      It doesn’t check for the other URLs which are present on that page.

      Hope I have cleared your doubt. Feel free to ask if you have more doubts.

      Reply
    • Hi,

      Thank you for your comment. 

      You need to provide dump of URL’s either from database or by an array. 
      When you provide one URL, the program checks whether GA is implemented on that page or not. 
      It doesn’t check for the other URLs which are present on that page.

      Hope I have cleared your doubt. Feel free to ask if you have more doubts.

      Regards,
      Nirzari.

      Reply
  • Avatar
    Aniruddha Banerjee
    January 22, 2013 8:55 am

    Hi Nizari,

    I checked this code with few sites but I get the not implemented result.
    Check your example site. There are GA code implemented there, I have cURL installedm, but result shows -ve.

    Aniruddha

    Reply
    • Hi Aniruddha,

      Thank you for testing this code and providing your comment. Yes, you are right. The code was not working perfectly for all the domains. That is because of the regex for script tag. I have now updated it and you can try the same for other domains. Please check once again and provide your feedback. Keep reading our blogs !

      Reply
    • Hi Aniruddha,
      Thank you for testing this code and providing your comment. Yes, you are right. The code was not working perfectly for all the domains. That is because of the regex for script tag. I have now updated it and you can try the same for other domains. Please check once again and provide your feedback. Keep reading our blogs !

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.
You need to agree with the terms to proceed

Menu