Hire Me!
Hire me for quality PHP, MySQL, Sqlite3, CSS, XHTML, Python and more coding projects. I have great feedback, you can look at my code on this site. I am fast and my code lasts. Currently No Openings for Coding Projects
If you have marketing, SEO or coding questions or concerns I also do some consulting ($135 per 90 minutes). I only take clients that I think I can give a plenty of actionable steps to. Feedback has been really good. Send me a link to your site and info about your marketing strategy thus far and I will see if we are a good fit.
Hit me up on Skype: thrilling_victory
Premium Scripts
-
Recent Posts
- When Should You Use a Web Framework Such as CakePHP, Pylons, etc…?
- Five More Free High PageRank/TrustRank DoFollow Links
- Catch More Juice With Link Circumcision
- The Ultimate Real Official Cheatsheet for Running Your User Generated Content Site.
- DFHU Quick Tip: Fighting Spam With User Exeperience Metrics
- DFHU Quick Tip: Save the Juice for Periodic Content
- A Mind For Converting Readers To Organic Linkers
- DFHU Quick Tip: Teaching Your Affiliates Through Keyword Bounty Hunting
- Trading Books and Stamps for Sophisticated, High-Yield Customers.
- Boiled Alive: Beating the Addition to Distraction, Confusion and Lack of Motivation.
- dfhuTip: Five Free High PR, High Trust Rank, DoFollow Links
- Kissing the Anvil for Four and Sixty Six Hours a Week
- (ab)using Apache’s `ab` Command to “multi-thread” PHP Files
- Using Farmers and Distillers To Build Your Niche List and Get Back Links
- How to Pick a Niche and Dominate Pt 3
I’m On Twitter
-
Pages
Categories
Archives
-
Thanks Commenters!
- Oinopion on When Should You Use a Web Framework Such as CakePHP, Pylons, etc…?
- Ryan on Five More Free High PageRank/TrustRank DoFollow Links
- Victory on Using Farmers and Distillers To Build Your Niche List and Get Back Links
- shesek on Using Farmers and Distillers To Build Your Niche List and Get Back Links
- adbox on When Should You Use a Web Framework Such as CakePHP, Pylons, etc…?
- Leeward Bound on When Should You Use a Web Framework Such as CakePHP, Pylons, etc…?
- Victory on When Should You Use a Web Framework Such as CakePHP, Pylons, etc…?
- Leeward Bound on When Should You Use a Web Framework Such as CakePHP, Pylons, etc…?

(ab)using Apache’s `ab` Command to “multi-thread” PHP Files
We are going to address how to how to (ab)use Apache’s `ab` function to run concurrent tasks in PHP. This will greatly speed up your scraping, proxy checking, “stats geo-location,” etc… type tasks.
Its not uncommon to have a list of ‘todo’ items for your PHP script to process where each todo item could be run concurrently and independently.
For example, lets say your PHP script goes to your favorite search engine and searches for some keyword. One hundred results are returned and you want to download each of those results pages.
It would not benefit you to get each one of the resulting pages independently, it would be faster to get, say, twenty five pages at a time.
You could break this task up so that the PHP “searcher” functionality is one file and the “page getter” functionality is in another PHP. Like
searcher.phpsearches for “Jobs in Portland” and stores the resulting results URL in an SQL table. thenpage_getter.phpgoes through each url one-by-one and downloads them.Or even worse that that, you can have the search and the page getting function in one file, which takes a long time to run and if something goes wrong, you have to backtrack.
Its also possible that you want the links on the pages that you downloaded to be processed too. This is slow and cumbersome to do with either the above methods.
What you really want is threading. PHP doesn’t have native threading support.
There are a few examples of fakeing php threads. Often times they rely on calling
procorexecboth of which are tricky little functions to deal with and many people (including myself) disable them inphp.inifor security reasons.Additionally, faking threading in php using proc/exec often means that the script had to be designed for using the threading classes to begin with. With the method I use, you can get away with little or no modification to your existing “doer” type files. In the above example “do.php” would be “page_getter.php.”
For those who have not used
abbefore its the Apache Benchmarking utility that comes standard with most Apache packages.The man page for `ab` is very straight forward. We will mostly take advantage of the
-nand-coptions. The-noption sets the number of times to open a URL, and-csets the number of concurrent connections. By creating a “todo” table and setting-cgreater than one we can pipeline our processing.Threading in PHP
Lets show how this is done with a trivial example.
There are basically four steps.
1) Create and Populate a “todo” table.
2) SELECT and DELETE a todo item from that table, making sure that no two doers get the same todo item. This is the controller or “get next” file.
3) include the do.php file, passing the value you selected/deleted from the todo list.
4) Optionally, Insert any new tasks into the todo list.
Its recommended, but not shown here, that you create a “done” list, where you cannot insert anything into todo which is listed in “done.” This prevents infinite loops.
Creating and Populating the todo table
Now we need to have the controller, this is the function which is called by `ab`.
The Getnext Function (getnext.php)
Finally, the file that actually does something. In this example, it doesn’t do much, but its added for completeness.
Do (do.php)
Now all wee need to do is call `ab` on the getnext.php file. Note that we can have the `ab` function on any server or on your desktop.
Calling getnext.php with ab
This will do 200 todo functions, with 10 concurrent connections, as an added bonus you will have cool output something like the following.
Running ab on getnext
Benchmarking site (be patient) Completed 100 requests Completed 200 requests Finished 200 requests Server Software: Apache/2.2.9 Server Hostname: site.faux Server Port: 80 Document Path: /concurrent/getnext.php Document Length: 0 bytes Concurrency Level: 10 Time taken for tests: 2.408 seconds Complete requests: 200 Failed requests: 0 Write errors: 0 Total transferred: 49800 bytes HTML transferred: 0 bytes Requests per second: 83.05 [#/sec] (mean) Time per request: 120.412 [ms] (mean) Time per request: 12.041 [ms] \ (mean, across all concurrent requests) Transfer rate: 20.19 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 2 7.6 0 47 Processing: 7 112 204.3 28 1479 Waiting: 7 111 204.3 28 1478 Total: 8 114 204.8 29 1479 Percentage of the requests served within a certain time (ms) 50% 29 66% 56 75% 120 80% 171 90% 281 95% 554 98% 700 99% 1472 100% 1479 (longest request)You can be more verbose using the
-v4option in ab, this will give you output of each call to do.php, which may be handy, or at least give you something to look at. You can also use-gto get TSV format output, which you could pass to a plotting software such as gnuplot.That (ab)out raps it up. You now have a tool to speed up your pipeline-able tasks in PHP.
Hint: This method doesn’t require the server with getnext.php to be running apache and PHP is hardly the only language you could do this in.
Note: that you can’t really send singles to the do.php files like you can
with true threads.
Edit
Sending Post data with Apache’s AB
Looking at the stats, this post is very popular and many people seem to want to know how to send post data with
ab. I will show an example. You will need a file with the postdata in it.POST data (postdata.txt)
Then you will need to run `ab` with two new options
-pfor POST data datafile and-Tto set theContent-typeheader.Calling ab with Postata
Please leave a question in the comment section if anything is unclear!