Catch More Juice With Link Circumcision

Although this seems like SEO 201, I still see many, many sites (including this one) not dealing with ill-formed auto-generated links. By “auto-generated” I mean like links inside email clients, message boards, social sites, etc… which automagically turn URLS into links. This results in 404s on your site and a lose to link love.

Dealing with the most common cases is a pretty easy and I will show how can be done with mod_rewrite and PHP.

As an added bonus I include a handy little tool to write unit tests to check your redirects, even if you don’t circumcise your links.

The problem is, that too often links like http://site.faux/foo.php, or http://site.faux/foo.php! are automatically created when people type your links into some online form or email client.

For example:

I really liked what Jonny had to say over at, http://yoursite.faux/i-rock.php, It was so fun. I wish for you to read it and buy all of his stuff. He said that i could drive his new car, you can read about it http://yoursite.faux/?type=new!

Link Circumcision Is the process by which you remove any trailing characters which are often lobbed on top of Incoming auto generated links. Common characters include ‘.’ , ‘!’ , ‘,’ , ‘)’, etc….

To deal with these bad links loosing all that juice, we start by creating a single entry style mod_rewrite section in our .htaccess file. You don’t have to create a single entry URL if you don’t wish, but it is a pretty common method.

In this case all non-existent files/directories will be routed through a route index.php file.

When there is no query string, bad characters will be lobed off before we pass the request off to PHP.

mod_rewrite link circumcision (/trailing/.htaccess)

# -*- conf -*-
# @file .htaccess for "fixing" trailing punctuation
# @author Victory dfhu.org
# BSD License
 
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteBase /trailing/
 
# single entry
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . index.php [L]
 
# you may be able to remove [L] for "Last" depending on how the 
# rest of your rewrite rules and conditions are written.
# Bad chars .!,)]"'&;:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} (\.|!|,|\)|\]|\"|\'|&|;|:)$
# Redirect to the same page, minus the last character
RewriteRule ^(.*).$ $1 [L,R=301]
 
</IfModule>

Now that we have cases without URL GET parameters take care of at the apache level, we can deal with the query string cases at the PHP level.

php link circumcision

<?php
/*
@author: Victory
@site: http://dfhu.org
@copyright: dfhu.org
@report_bugs: bugs(at)dfhu.org
@feature_request: features(at)dfhu.org
@file: index.php
@license: BSD
 
@description:
 
  Example of redirectingon bad trailing punctuation, commonly
  found by auto link generators in blogs, email clients, etc.
 
*/
 
// We start by building a regex of the most common trailing chars.
$trailing_regex = "/(\.|!|,|\)|\]|\"|\'|&|;|:)$/"; 
 
//' Store the Request for easy access.
$request = 
  $_SERVER['REQUEST_URI'];
 
// Having already dealt with REQUEST_URI problems in .htaccess we can
// mostly stick to just the query string here in php. So, if we find a
// trailing puncutation mark then ...
if (preg_match($trailing_regex,$request)) {
  // ... we remove the last char from query_string
  $fixed_url =
    substr($request,0,-1);
 
  // and redirect to that fixed url.
  header("Location: $fixed_url");
  // Poof! we're done.
  exit;
}
 
print_r($_SERVER['REQUEST_URI']);
?>

Handy Redirect Unit Tests

It is very handy to have a very simple framework to use for unit testings url redirects, because you can fix one redirect and easily break others. Because of this you save your self a lot of time to write down all the tests you wish to run.

With my method of testing you name your tests ($title), and label the source ($src) and destination ($dest) URLs and check that where you start gets you to where you want to go.

Redirect Tests

<?php
/*
 
@author: Victory
@site: http://dfhu.org
@copyright: dfhu.org
@report_bugs: bugs(at)dfhu.org
@feature_request: features(at)dfhu.org
@file: tests.php
@license: BSD
 
@description:
 
  Unit tests for Link Circumcision
 
*/
 
// The charters you want to cut the end of urls.
$bad_tips = Array(
  ".","!",",",":",
  "]",")",";","&");
 
/**
 * @brief A struct to hold request results
 *
 */
class LoadResult {
  protected $http_code;
  protected $url;
  protected $redirect_count;
 
 
  /**
   * @brief Store the $header info
   *
   * @param $header $header = curl_getinfo($ch);
   *
   */
  function __construct($header){
    $this->http_code = $header['http_code'];
    $this->url = $header['url'];
    $this->redirect_count = $header['redirect_count'];
  }
 
  /**
   * @brief  lazy mans getter
   *
   * @param $kk key
   *
   * @return
   *
   */
  function __get($kk){
    return $this->$kk;
  }
}
 
 
/**
 * @brief retrieve a url.
 *
 * Gets header, follows redirects, fails on error
 *
 * @param $url The url to retrive
 *
 * @return LoadResult - lr with results of loading
 *
 */
function load_url($url){
 
  $curl_options = Array(
    CURLOPT_URL => $url,
    CURLOPT_HEADER => TRUE,
    CURLOPT_FAILONERROR => TRUE,
    CURLOPT_FOLLOWLOCATION => TRUE,
    CURLOPT_RETURNTRANSFER => TRUE,
    CURLOPT_USERAGENT => "DFHU Url Test (dfhu.org)",
    CURLOPT_AUTOREFERER => TRUE,
    CURLOPT_TIMEOUT => 3,
    CURLOPT_MAXREDIRS => 10
			);
  $ch = curl_init();
  curl_setopt_array($ch, $curl_options);
 
  $result = curl_exec($ch);
  $header = curl_getinfo($ch);
  curl_close($ch);
 
  return new LoadResult($header);
 
}
 
// yee ole echo function
function e($str){
  echo "$str";
}
 
 
/**
 * @brief compare $src to $dest
 *
 * @param $title - the "name" of the test
 * @param $src - The entry point url
 * @param $dest - the expected end point url
 *
 */
function redirect_test($title,$src,$dest){
  // Get the url for src and store it
  $lr=load_url($src);
 
  // See if the results are bad, ok, or great success.
  if ($lr->url != $dest) {
    $class="error";
  }elseif ($lr->http_code != "200" or
	   $lr->redirect_count > 1) {
    $class="warning";
  }else{
    $class="success";
  }
 
  // Spew the bial of result
  ?>
<h2><?php e($title) ?></h2>
<table class="<?php e($class) ?>">
  <tr><td>URL</td><td><?php e($src) ?></td></tr>
  <tr><td>Expected</td><td><?php e($dest) ?></td></tr>
  <tr><td>Result</td><td><?php e($lr->url) ?></td></tr>
  <tr><td>Code</td><td><?php e($lr->http_code) ?></td></tr>
  <tr><td>Redirects</td><td><?php e($lr->redirect_count) ?></td></tr>
</table>
  <?php
 
}//redirect_test
 
// Bitch and Bail if PHP curl is not loaded.
if (!extension_loaded('curl')) {
  echo "
You need Curl Loaded to run these tests. The redirects still might
work, but I can't test them.
";
  exit;
}
// -----------------------------------------------
// ---- Start of HTML ----------------------------
// -----------------------------------------------
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
    "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
  <title>Unit Tests for Link Circumcision</title>
  <meta http-equiv="content-type" 
           content="text/html; charset=utf-8">
  <style type="text/css">
 
  table{
   padding-right: 30px;
  }
  table.success {
   background-color: #cfc;
   border: 1px solid #0F0;
  }
  table.warning {
   background-color: #ffc;
   border: 1px solid #FF0;
  }
  table.error {
   background-color: #fcc;
   border: 1px solid #F00;
  }
 
  </style>
</head>
<body>
 
  <blockquote>
    <b>Peter:</b> Do you charge a lot for your circumcisions?<br>
    <br>
    <b>Rabbi:</b> No. I just keep the tips! <br>
    <br> ~ Family Guy
  </blockquote>
 
<?php
// -----------------------------------------------
// ---- start of tests ---------------------------
// -----------------------------------------------
 
// store the host name for easy access
$host=$_SERVER['HTTP_HOST'];
 
$src="http://$host/trailing/";
redirect_test("Simple No redirect",
	      $src,
	      $src);
 
$src="http://$host/trailing/index.php...";
redirect_test("Elipses",
	      $src,
	      substr($src,0,-3));
 
$src="http://$host/trailing/foo.php";
redirect_test("Single Entry NO Trailing",
	      $src,
	      $src);
 
$src="http://$host/trailing/foo.php.";
redirect_test("Single Entry Trailing",
	      $src,
	      substr($src,0,-1));
 
$src="http://$host/trailing/foo.php?bar=baz";
redirect_test("Params No Trailing",
	      $src,
	      $src);
 
$src="http://$host/trailing/foo.php?bar=baz,";
redirect_test("Params Trailing",
	      $src,
	      substr($src,0,-1));
 
$src="http://$host/trailing/foo.php?bar=baz";
redirect_test("Single Entry Params No Trailing",
	      $src,
	      $src);
 
$src="http://$host/trailing/foo.php?bar=baz!";
redirect_test("Single Entry Params Trailing",
	      $src,
	      substr($src,0,-1));
 
// Do basic test against bad tips, you could test all of the above if
// you wanted to be verbose.
foreach($bad_tips as $bad_tip){
 
  $src="http://$host/trailing/foo.php?bar=baz$bad_tip";
  redirect_test("Single Entry Bad Tip $bad_tip",
		$src,
		substr($src,0,-1));
 
  $src="http://$host/trailing/index.php$bad_tip";
  redirect_test("Bad Tip $bad_tip",
		$src,
		substr($src,0,-1));
 
}
 
?>
 
</body>
</html>

Here is an example output of url redirect links.

Example PHP Unit Testing Redirect Output

Testing redirects for Link Circumcision

To add a new test, just use redirect_test("Name of Test", $where_you_start, $where_you_should_end)

Please take a moment to rate this post and leave a comment or question.

Share
This entry was posted in seo and tagged , , , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.
1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 5.00 out of 5)
Loading ... Loading ...

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">

If you are going to post code please use:
<pre lang="php" escaped="true"> YOUR_CODE_HERE </pre>

Change the lang to mysql, python, lisp, whatever. This will escape your code.