(ab)using Apache’s `ab` Command to “multi-thread” PHP Files

We are going to address how to how to (ab)use Apache’s `ab` function to run concurrent tasks in PHP. This will greatly speed up your scraping, proxy checking, “stats geo-location,” etc… type tasks.

Its not uncommon to have a list of ‘todo’ items for your PHP script to process where each todo item could be run concurrently and independently.

For example, lets say your PHP script goes to your favorite search engine and searches for some keyword. One hundred results are returned and you want to download each of those results pages.

It would not benefit you to get each one of the resulting pages independently, it would be faster to get, say, twenty five pages at a time.


You could break this task up so that the PHP “searcher” functionality is one file and the “page getter” functionality is in another PHP. Like searcher.php searches for “Jobs in Portland” and stores the resulting results URL in an SQL table. then page_getter.php goes through each url one-by-one and downloads them.

Or even worse that that, you can have the search and the page getting function in one file, which takes a long time to run and if something goes wrong, you have to backtrack.

Its also possible that you want the links on the pages that you downloaded to be processed too. This is slow and cumbersome to do with either the above methods.

What you really want is threading. PHP doesn’t have native threading support.

There are a few examples of fakeing php threads. Often times they rely on calling proc or exec both of which are tricky little functions to deal with and many people (including myself) disable them in php.ini for security reasons.

Additionally, faking threading in php using proc/exec often means that the script had to be designed for using the threading classes to begin with. With the method I use, you can get away with little or no modification to your existing “doer” type files. In the above example “do.php” would be “page_getter.php.”

For those who have not used ab before its the Apache Benchmarking utility that comes standard with most Apache packages.

The man page for `ab` is very straight forward. We will mostly take advantage of the -n and -c options. The -n option sets the number of times to open a URL, and -c sets the number of concurrent connections. By creating a “todo” table and setting -c greater than one we can pipeline our processing.

Threading in PHP

Lets show how this is done with a trivial example.

There are basically four steps.

1) Create and Populate a “todo” table.

2) SELECT and DELETE a todo item from that table, making sure that no two doers get the same todo item. This is the controller or “get next” file.

3) include the do.php file, passing the value you selected/deleted from the todo list.

4) Optionally, Insert any new tasks into the todo list.

Its recommended, but not shown here, that you create a “done” list, where you cannot insert anything into todo which is listed in “done.” This prevents infinite loops.

Creating and Populating the todo table

<?php
$pdo=new PDO("sqlite:./db/todo.sqlite");
$pdo->setAttribute(PDO::ATTR_ERRMODE,
                PDO::ERRMODE_EXCEPTION);
 
 
// Create a simple table, with things you want to do concurrently
$sql="
CREATE TABLE IF NOT EXISTS 
  todo (
   url TEXT
  )";
$pdo->query($sql);
 
 
/* fill the todo table */
 
// you could fill the todo table, from results of the individual
// "threads"
 
$sql="
INSERT INTO todo
  (url)
    VALUES
  (:url)
";
$stmt=$pdo->prepare($sql);
 
// Remember that in SQLite you have to start a transaction when you
// are going to do two or more inserts or else it is slow.
$pdo->beginTransaction();
// add 300 things to the todo list
for($ii=0; $ii<300; $ii++){
  $todo_url=
    Array(":url"=>"http://site.faux/page$ii.html");
  $stmt->execute($todo_url);
}
// Ok we are done with inserts, write everything to the table.
$pdo->commit();
 
/**/
?>

Now we need to have the controller, this is the function which is called by `ab`.

The Getnext Function (getnext.php)

<?php
$pdo=new PDO("sqlite:./db/todo.sqlite");
$pdo->setAttribute(PDO::ATTR_ERRMODE,
                  PDO::ERRMODE_EXCEPTION);
 
 
// We need to begin and EXCLUSIVE transaction because we want to lock
// the file for both the select and delete, if we don't lock the file
// then we could get multiple "threads" doing the same todo item.
$pdo->query("BEGIN EXCLUSIVE TRANSACTION");
 
$sql="
SELECT 
  url 
FROM 
  todo
LIMIT 1"; 
 
// Try to get the "next" item in the todo list and if there is no such
// item in the list ...
$result=$pdo->query($sql);
if(!$todo=$result->fetch(PDO::FETCH_ASSOC)){
  // ... then commit the transaction to unlock the table ...
  $pdo->query("COMMIT");
  // ... and bail.
  exit;
}
 
// Ok, seems we have a url, so lets remove it from the "todo" list now.
$sql="
DELETE FROM
  todo
WHERE
  url = :url";
$stmt=$pdo->prepare($sql);
$doing_url=Array(":url"=>$todo['url']);
$stmt->execute($doing_url);
 
// Now we want to close up the results cursor so that we can ...
$result->closeCursor();
// commit the transaction and write the data to the table.
$pdo->query("COMMIT");
 
 
 
// We will simulate having the todo url being sent via the
// $_POST super global.
$_POST['url']=$todo['url'];
 
// Now include the file that actually does something.
require('do.php');
 
 
/**/
// Optionally, have do.php set a variable of "success" or "failure" so
// that we can ...
if($thread_failed){
  // ... reinsert that item into the "todo" list
  $sql="
INSERT INTO 
 todo
  (url)
    VALUES
   (:url)
";
 
  $stmt=$pdo->prepare($sql);
  $stmt->execute($doing_url);
}
 
/**/
?>

Finally, the file that actually does something. In this example, it doesn’t do much, but its added for completeness.

Do (do.php)

 
/** Do Something **/ 
echo $_POST['url'];
 
/** Maybe insert more values into todo table **/
 
// Silly example of setting a "failure" parameter
$thread_failed=True;
if(10 < rand(0,100)){
  $thread_failed=False;
}

Now all wee need to do is call `ab` on the getnext.php file. Note that we can have the `ab` function on any server or on your desktop.

Calling getnext.php with ab

 
shell% ab -c 10 -n 200 "http://site.faux/getnext.php"

This will do 200 todo functions, with 10 concurrent connections, as an added bonus you will have cool output something like the following.

Running ab on getnext

 
Benchmarking site (be patient)
Completed 100 requests
Completed 200 requests
Finished 200 requests
 
 
Server Software:        Apache/2.2.9
Server Hostname:        site.faux
Server Port:            80
 
Document Path:          /concurrent/getnext.php
Document Length:        0 bytes
 
Concurrency Level:      10
Time taken for tests:   2.408 seconds
Complete requests:      200
Failed requests:        0
Write errors:           0
Total transferred:      49800 bytes
HTML transferred:       0 bytes
Requests per second:    83.05 [#/sec] (mean)
Time per request:       120.412 [ms] (mean)
Time per request:       12.041 [ms] \
   (mean, across all concurrent requests)
Transfer rate:          20.19 [Kbytes/sec] received
 
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    2   7.6      0      47
Processing:     7  112 204.3     28    1479
Waiting:        7  111 204.3     28    1478
Total:          8  114 204.8     29    1479
 
Percentage of the requests served within a certain time (ms)
  50%     29
  66%     56
  75%    120
  80%    171
  90%    281
  95%    554
  98%    700
  99%   1472
 100%   1479 (longest request)

You can be more verbose using the -v4 option in ab, this will give you output of each call to do.php, which may be handy, or at least give you something to look at. You can also use -g to get TSV format output, which you could pass to a plotting software such as gnuplot.

That (ab)out raps it up. You now have a tool to speed up your pipeline-able tasks in PHP.

Hint: This method doesn’t require the server with getnext.php to be running apache and PHP is hardly the only language you could do this in.

Note: that you can’t really send singles to the do.php files like you can
with true threads.

Edit

Sending Post data with Apache’s AB

Looking at the stats, this post is very popular and many people seem to want to know how to send post data with ab. I will show an example. You will need a file with the postdata in it.

POST data (postdata.txt)

 
foo=bar&biz=baz

Then you will need to run `ab` with two new options -p for POST data datafile and -T to set the Content-type header.

Calling ab with Postata

 
shell% ab -c10 \
          -n100 \
          -v4 \
          -p postdata.txt \
          -T 'application/x-www-form-urlencoded' \
          http://site.faux/post/index.php

Please leave a question in the comment section if anything is unclear!

Share
This entry was posted in intermediate.programming and tagged , , , , , , , , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.
1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 5.00 out of 5)
Loading ... Loading ...

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">

If you are going to post code please use:
<pre lang="php" escaped="true"> YOUR_CODE_HERE </pre>

Change the lang to mysql, python, lisp, whatever. This will escape your code.