Where You Been Web Analytics

What would you do if you had access to all your visitors browser history? You would know if they have already visited your affiliate programs landing page, your competitors, other sites in your campaign. You may be able to make an educated guess about their age, sex, level of affluence, ethnicity and interests. You could redirect, greet or sell differently to each visitor depending on where they have been.

Well with this free (BSD License) whereyoubeen script you can do pretty much that. Well more specifically you can get an answer YES or NO to if they visited a given page in a list of URLS your provided.

Where You Been

Is a collection of scripts that I have created to test a list of user provided URLs to see if they are in the visitors browser history the save them to a database.

Javascript (jQuery) is used to check the rendered color of links in a hidden DOM object to compare if they are the a:visited color or just the normal a color.

A jQuery.ajax() call is made to POST the array of visited links to a PHP script which then dumps them into a SQLite database.

A simple stats script is also provided, which shows the frequency of users who have visited each of pages in your list given that they have visited at least one of the pages in your list.

Download WhereYouBeen

You can download the whereyoubeen tool ready to go right here. Note that it requires PHP5.2 with PDO/Sqlite support. PDO/SQLite comes standard with current binary builds of PHP. If you think your host isn’t running PHP5 try adding the following to your .htaccess file AddHandler application/x-httpd-php5 .php.

Now lets look at how we can invoke the script in an php file to be show to surfers.

Calling Where You been. (index.php)

<html>
 <head>
  <title>My Webpage</title>
  <?php
   // if you already have jQuery included, you can set this to True
   $ALREADY_HAVE_JQUERY=False;
   // set this to the path where the script files are
   $WYB_PATH="/wyb";
   // this includes all the headers and javascript
   include($_SERVER['DOCUMENT_ROOT'] .
           "$WYB_PATH/wyb_header.inc.php"); 
  ?>
 
 </head>
 <body>
   <p>This some content to show the user</p>
 </body>
</html>

Thats pretty much it. You can edit wyb-sites-to-check.txt to customize what EXACT URLS to check. The stats can be shown in wyb_stats.php.

Onto The Code

Now for those that are interested, lets look at some of the files that make up the whereyoubeen tools.

The header, drags all the links to check into the javascript, and then calls the whereyoubeen been javascript function.

WYB Header (wyb_header.inc.php)

<?php
/*
  @file: wyb_header.inc.php - prints the whereyoubeen javascript. This
    file should be included in the page you want to check. Using
    something like
 
 
         // If you are already using jquery, set this to True
         $ALREAY_HAVE_JQUERY=False;
         // would be www.example.com/whereyoubeen/
         $WYB_PATH="/whereyoubeen"; 
         // this includes the header
         include($_SERVER['DOCUMENT_ROOT'] .
          "$WYB_PATH/wyb_header.inc.php"); 
 
 
 
  @author: Victory
  @site: http://dfhu.org/blog/
  @version: 1.0
  @date: 090721
  @license: BSD
 
 */
 
// A better check would be to see if $_COOKIES are set, but that would
// require a set page and a check page, until then just check to see
// if this is a Mozilla/Opera/Safari browser but if not ...
if(!preg_match("/^(Mozilla|Opera)/",$_SERVER['HTTP_USER_AGENT'])){
  // ... bail.
  echo $_SERVER['HTTP_USER_AGENT'];
  return;
}
 
// construct the that path to whereyoubeen on the server
$WYB_ABS_PATH=
  $_SERVER['DOCUMENT_ROOT'] . 
  "$WYB_PATH/";
 
// open up the sqlite database;
$db=new PDO("sqlite:$WYB_ABS_PATH/db/wyb.sqlite");
 
// Now, Check to see if the user's IP is in the users table by ...
$sql="
SELECT
 rowid
FROM 
 users
WHERE
 remote_addr=:remote_addr
 ";
// ... preparing and ...
$stmt=$db->prepare($sql);
// ... excuting the the $sql statment using the users IP address.
$stmt->execute(Array(":remote_addr"=>$_SERVER['REMOTE_ADDR']));
 
// if the user's ip is in the database ...
if($stmt->fetch()){
  // ... return and don't run any tests.
  return;
}
 
// So now that we decided we are going to go ahead with the tests,
// lets ensure that the jquery variable is set.
if(!isset($ALREADY_HAVE_JQUERY)){
  $ALREADY_HAVE_JQUERY=False;
}
 
// If we need jquery ...
if(!$ALREADY_HAVE_JQUERY){
  // ... then print the script element to include it.
echo "
<script 
  type=\"text/javascript\" 
  src=\"$WYB_PATH/js/jquery-1.3.2.min.js\"></script>
";
}
 
// Print the include statament for whereyoubeen.js which contains the
// logic to check which sites the user has been to.
echo "
<script 
  type=\"text/javascript\"
  src=\"$WYB_PATH/js/whereyoubeen.js\"></script>
";
?>
 
<style type="text/css">
// set up different colors for visited/non visited links
ul#silent_append a{
color: #F00 !important;
}
ul#silent_append a:visited{
color: #00F !important;
}
</style>
 
<script type="text/javascript">
 
<?php
  // If you already have jquery in your page, then don't mess with
  // your preference for conflict.
  if(!$ALREADY_HAVE_JQUERY){
    echo "jQuery.noConflict();";
  }
 
// Wait for the document to be ready ...
?>
jQuery(document).ready(function(){
 
  <?php
    // .. and when it is we need to need the urls to check.  Including
    // wyb_urls_get.php will produce the a javascript parsable list of
    // quoted urls. We store that array in utc (Urls To Check).
  ?>
  var utc = 
    [<?php 
     include($_SERVER['DOCUMENT_ROOT'] . 
             "$WYB_PATH/wyb_urls_get.php"); 
     ?>];
 
  <?php
    // For all the links in utc, check to see if they are in the
    // user's history. This is done by checking the color of the
    // links as rendered in a hidden element of the dom.
  ?>
  var visited=
    whereyoubeen(
      utc,
      '<?php echo $WYB_PATH; ?>/wyb_urls_save.php');
 
  <?php
 
  // You could also use 'visited' here to change the DOM, for instance
  // create a meta redirect, or place a "Warning About Competitor,"
  // popup on the page and so on.
 
  ?>
});
 
</script>

Running this on the (index.php) above will give something like:

Actualized (index.php)

<html>
 <head>
  <title>My Webpage</title>
 
  <script 
    type="text/javascript" 
    src="/whereyoubeen/js/jquery-1.3.2.min.js"></script>
 
   <script 
     type="text/javascript"
     src="/whereyoubeen/js/whereyoubeen.js"></script>
 
 <style type="text/css">
  // set up different colors for visited/non visited links
   ul#silent_append a{
   color: #F00 !important;
  }
  ul#silent_append a:visited{
   color: #00F !important;
  }
 
</style>
 
<script type="text/javascript">
 
jQuery.noConflict();
jQuery(document).ready(function(){
 
    var utc = 
      ['http://dfhu.org/blog/index.php',
       'http://www.bing.com/',
       'http://www.whycanttoryread.com/',
       'http://dfhu.org/blog/',
       'http://www.ebay.com/',
       'http://www.craigslist.org/',
       'http://chicago.craigslist.org/',
       'http://exactly.com/as/itwould.html'];
 
    var visited=
      whereyoubeen(
        utc,
        '/wyb/wyb_urls_save.php');
});
 
 </script>
 </head>
 <body>
   <p>This some content to show the user</p>
 </body>
</html>

NOTE: that this will not show every visit, but only on 1 visit per day. This is accomplished with a SQLite queue which is created using a TRIGGER . The following shows the database schema. Running this script also clears out any data in the database.

The Database Schema (wyb_makedb.php)

 
<?php
 
$db=new PDO("sqlite:db/wyb.sqlite");
 
$sql="DROP TABLE IF EXISTS users";
$db->query($sql);
 
$sql="
CREATE TABLE IF NOT EXISTS users (
 remote_host TEXT,
 remote_addr TEXT UNIQUE,
 last_visit DATETIME DEFAULT CURRENT_TIMESTAMP
);";
$db->query($sql);
 
 
$sql="DROP INDEX IF EXISTS remote_addr_idx";
$db->query($sql);
 
 
$sql="
CREATE INDEX IF NOT EXISTS remote_addr_idx 
 ON users(remote_addr)
";
$db->query($sql);
 
 
$sql="DROP TABLE IF EXISTS whereyoubeen";
$db->query($sql);
 
$sql="
CREATE TABLE IF NOT EXISTS whereyoubeen (
 remote_addr TEXT,
 remote_host TEXT,
 user_agent TEXT,
 url TEXT,
 last_visit DATETIME DEFAULT CURRENT_TIMESTAMP
)";
$db->query($sql);
 
$sql="
CREATE TRIGGER IF NOT EXISTS
 clean_up_old 
BEFORE INSERT ON 
 users
BEGIN
 DELETE FROM 
  users 
 WHERE 
  last_visit < DATETIME('NOW','-1 day');
END
";
$db->query($sql);
?>

I didn’t comment this much, because SQL (being a functional language and all) is pretty easy to read directly. When you are testing a new setup, you can run this script inbetween visits to clean out the user data (so wyb_header.inc.php will fire off). Otherwise, you could use sqlite3 command line client to delete manually.

Manually deleting user data with SQLite

shell% sqlite3              
SQLite version 3.5.9
Enter ".help" for instructions
sqlite> attach database 'db/wyb.sqlite' as wyb;
sqlite> delete from wyb.users;

The code for getting wyb_urls_get.inc.php and saving wyb_urls_save.inc.php are not really that interesting, they are well commented and if you have questions you can post them in the comments.

Now looks look at the javascript, remember that this requires jQuery.

Pseudo-Searching Browser History (js/whereyoubeen.js)

 
function whereyoubeen(urlsToCheck,path_to_savelinks){
 
  // If we have no links to check ...
  if(urlsToCheck.length == 0){
    // ... just bail.
    return;
  }
 
  // So we have links to check and don't like to write 'jQuery' over
  // and over again so we shorten it to $.
  $=jQuery; 
 
 
  // To organize links a bit we build a hidden ul so that we can
  // append the links to test.
  $('body')
    .append("<ul id='silent_append'></li>");
 
 
  // Lets create a closure to append links to ...
  function appendLink(href,id){
    // ... append a link to #silent_append.
 
    // @param string href - link's href
    //
    // @param string id - the id of li that holds the link. If its
    // not set a random id will be choosen
    //
    // Returns id which may have been generated randomly
 
    // We are going to use the id later to find css and to remove,
    // so lets create one if we don't have one
    if(!id){
      id=Math.floor(Math.random()*10000);
    }
 
    // A modest, but proud, link is created here.
    var link = 
    '<a id="' +id+ '" href="'
      +href+'">'+href+'</a>';
 
    // Place that link in the id="silent_append" ul where it will be
    // easy to get to.
    $("#silent_append")
      .append('<li style="display:block;" id="li_' +id+
                  '">'+link+'</li>');	       
 
    return id;
  }// appendLink
 
 
 
  // We are going to see what the css color is for a URL that has not
  // been visted, so here is a URL that hasn't been visted.
  var noVisited = 
    'http://' + 
    Math.floor(Math.random()*1000000) +  
    ".com";
 
  // Now if we append a link we havn't visted and one that we have
  // visited then ...
  appendLink(document.location,"yes_visted");
  appendLink(noVisited,'no_visted');
 
  // ... we can calibrate using the links color to see what the colors
  // are for visited and unvisited links.
  var yesVisitedColor =
    $("#yes_visted").css("color");
  var noVisitedColor = 
    $("#no_visted").css("color");
 
  // If those colors are the same ...
  if(yesVisitedColor == noVisitedColor){
    // ... then just forget about it, your css is crap, i am going
    // home.
    return; 
  }
 
  // Remove the elements from the dom to keep the site moving
  // zippy.
  $("#li_yes_visted").remove();
  $("#li_no_visted").remove();
 
 
  // We need a place to store links that have been visited, lets use an
  // array()
  var visited=Array();
 
  // now foreach of the links to check ...
  for(i in urlsToCheck){
    // ... append them to our ul so that we can ...
    idOfString=appendLink(urlsToCheck[i]);
 
    // ... check their color against the visited color and ...
    var curColor=$("#" + idOfString).css("color")
    if(curColor == yesVisitedColor){
      // ... if it matches we append it to the visited array.
      visited[visited.length]=urlsToCheck[i];
    }
 
    // we won't be needing that link clogging up the dom anymore so
    // lets remove it.  
    $("#li_" + idOfString).remove();
  }// for urlsToCheck
 
  // We then send our results off to the database via PHP.
  $.ajax({
    type: "POST",
    url: path_to_savelinks,
    data: {'visited': visited.join("|||")},
    success: function(data){
      // you could uncomment this if you wanted savelinks to say
      // something, or maybe send back an XML object 
      //$('body').append("<p>" + data + "</p>");
    }
  });
 
  // finally we return visited so it can be used to affect the dom
  return visited;
};

The highlights of the script are to create visited and unvisited links, to calibrate for color. The rendered color is a tell-tale sign of weather a users has visited a given page or not.

The script then uses an AJAX to send the data to PHP which in turn sends it off to the SQLite database.

I really need feedback from you.

If you are technical I would love for you to point out any bugs or issues you see. I also like to chat about coding styles and idioms so feel free to post that flavor of discussion too.

If you are not technical then let me know if you had any problems setting up the script and any cleaver ideas of how to use this very juicy information.

There are countless mods and possibilities for these tools, I would love to hear your ideas on what this could be used for. If its simple enough i might implement it for free and put it on here, if its more involved we can work out a fair price for private coding work.

This is given with a BSD license so you can use it in your commercial projects and distribute it on your website (I just ask for link back to me dfhu.org).

Share
This entry was posted in intermediate.programming and tagged , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.
1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">

If you are going to post code please use:
<pre lang="php" escaped="true"> YOUR_CODE_HERE </pre>

Change the lang to mysql, python, lisp, whatever. This will escape your code.