I'm something like "devops" here, which is probably just a fancy way of saying I'm about the only geek in the company. I don't know that this is rocket science, but I thought it might be of interest and perhaps help someone out there who has to manage scarce server resources (I can't be the ONLY one who can't just grab a company credit card & order up more servers any time we need extra computational power ...)
We run the Apache webserver with PHP. We have a number of "cron jobs" that run also to support the web site.
If you don't know what "cronjobs" are, they're a functional equivalent to "Scheduled Tasks" on a Windows workstation (but much older, having been in UNIX since the mid-to-late 1970's); they can be very handy, even important, in managing various aspects of a site. At my work we use a number of cronjobs to do things that don't need to be done for every WWW page load (example, creating "category" link menus based on large numbers of products) or simply done on a recurring basis (example: sending out 'reminder' emails to users with products in an abandoned cart, or updating our "sitemap" files). A typical server's crontab file might look something like this:
#min hr day month weekday command
# Just after 7 AM/PM we sync the FOO database from the home office
03 7,19 * * * /usr/home/me/scripts/fetch_foo_db
#Is FOO up? Every half hour...
*/30 * * * * /usr/home/me/foochecker/is_foo_up.php > /dev/null 2>&1
07 0 * * * /usr/local/bin/php -q /usr/home/me/scripts/daily_stats
26 0 1 * * /usr/local/bin/php -q /usr/home/me/scripts/monthly_stats
35 7 * * 1 /usr/local/bin/php -q /usr/home/me/scripts/check_top_stories.php
04 0,12 * * * /usr/local/bin/php -q /usr/home/me/scripts/blogger_queue.php >/dev/null 2>&1
42 7 * * 1-5 /usr/home/me/scripts/backup_disk_check
#BAR sitemap re-generation
54 4 10 * * /usr/local/bin/php -q /usr/local/www/apache/data/bar.com/html/sitemaps/make_sitemap.php >/dev/null 2>&1
46 */8 * * * /usr/home/me/scripts/purge_binlogs.php
#STATIC.FOO.COM Scripts
*/10 * * * * /usr/home/me/scripts/sync_user_images > /dev/null 2>&1
1 */8 * * * /usr/home/me/scripts/sync_images > /dev/null 2>&1
3 */8 * * * /usr/home/me/scripts/sync_user_logos > /dev/null 2>&1
27 4 * * * /usr/home/me/scripts/image_confabulator > /dev/null 2>&1
#FOO CACHING. The homepage of FOO is fetched frequently. FOO uses this file to show the homepage to *most* visitors.
*/6 * * * * /usr/home/me/scripts/get_foo_index > /dev/null 2>&1
#bar.com now lives here; these scripts help us Frobnitz the Confabulator and Weefoo the Blargh ...
*/15 * 1-10 * * /usr/home/me/scripts/bar/makecode.php > /dev/null 2>&1
*/15 * 1-10 * * /usr/home/me/scripts/bar/makelists.php > /dev/null 2>&1
#Clean out /var/cache/pkg on 1st of month
27 4 1 * * /usr/local/bin/sudo /usr/sbin/pkg clean --yes --quiet
#ditto the images cache > 180 days just before midnight each night
52 23 * * * /usr/bin/find /usr/local/www/apache/data/foo.com/html/imagecache/rendered -mtime +180 | /usr/bin/xargs /bin/rm
#let me know when that kid in editorial posts new stories
14,29 7-16 * * 1-5 /usr/home/me/scripts/cms_watcher
44,59 7-16 * * 1-5 /usr/home/me/scripts/cms_watcher
Of course we tried to make sure that these jobs ran quickly, and at times that shouldn't conflict with one another; but as we found more jobs that needed to be done, we also began to find sometimes these jobs were running concurrently and taking up lots of server resources (and no, porting them from PHP to Python or "sh" didn't help significantly with this). This issue grew larger as we scaled things like the number of users, number of products, etc. We might notice high load on the server, log in, and see that three or four of these jobs were running at once, because one wasn't finished when the next started, and things really started to "get bogged down" (example: a large number of these jobs have to access the local MySQL server which is already being strained by Apache when the site is busy).
We needed cron jobs to "wait their turn" on the system; to run only one job at a time (sequential processing) in order to allow the maximum amount of RAM, disk, and CPU to be utilized by Apache instead of our background tasks.
So, what did we do?
First, we removed all the jobs from the crontab file, and "on paper" separated the "jobs" into three classifications: those done daily, those done hourly, and those done more frequently. We set up only three jobs in the server's crontab(1) file:
#min hr day month weekday command
59 * * * * /home/scripts/crontabs/hourly.php > /dev/null 2>&1
02 0 * * * /home/scripts/crontabs/daily.php > /dev/null 2>&1
*/10 * * * * /home/scripts/crontabs/frequent.php > /dev/null 2>&1
To ensure sequential instead of concurrent job processing, only one of each of these jobs can run at a time. We do this with a lock file, shown here in the "daily.php" cronjob file:
//check for the lockfile, sleep if it exists
$lockfile = "/tmp/cron.lock";
while (file_exists($lockfile)) {
sleep(15); //hourly uses 60,
}
//no lock exists, so create our instance of the lockfile
file_put_contents($lockfile, "daily"); //lockfile contains name of job that created it
Rather than a constant sleep, the hourly job exits if another hourly lockfile exists (this is a somewhat scary prospect; it means our hourly jobs have been running for sixty minutes!) If it's the "daily" cron or a "frequent" cron it will sleep instead:
if (file_exists($lockfile) && trim(file_get_contents($lockfile)) == "hourly") {
exit;
}
The frequent job exits if the lockfile exists at all ... after all, it will be run again in 10 minutes, so no great loss there:
if (file_exists($lockfile)) {
exit();
}
A "job file" looks something like this (note that our scripts are set executable in all cases):
// 1. Copy "top" logs
system("/home/me/administrative_scripts/endtop.sh >/dev/null 2>&1");
// 2. Listing History
system("/home/me/administrative_scripts/listing_history.php");
// 3. FOO Status Monitor
system("/home/me/administrative_scripts/foo/foo_status_monitor");
Now, cron(1) has "to the minute" granularity, which we have sacrificed; but it's not really a sacrifice as we WANTED strict sequential processing. It also has "any time" ability; that is, if I want a job to run at 2:49 pm, you can make it so with cron(1). If I want a job to run twice an hour, or every other day, or every Tuesday, you can do it. We've attempted to retain some of this ability in our "replacement system" by setting time vars inside the job file:
//what time is it?
$minute = intval(date('i'));
$hour = date('G'); //0,1,2...23
$even = (intval(date("J"))%2);
These can be used for more granular job scheduling. A job can be run "every other day" like this:
if ($even) {
system("/home/me/public_html/includes/make_sitemap"); //create our sitemap
}
Here's a job that runs twice a day; it's placed in the "hourly" job file:
if ($hour == 4 || $hour == 16) {
system( "/home/me/administrative_scripts/foo/dump_FOO_db" ); //dump the DB
}
I can run a job at a specific hour with "if ($hour == 5)" or several times a day with "if (in_array($hour, array(1,7,14,21))". The same thing holds true for running them more than once an hour via the frequent job file:
//hour is segmented for jobs with an interval exceeding 10 minutes
if ($time <= 10) { $zero = true; }
elseif ($minute <= 20) { $one = true; }
elseif ($minute <= 30) { $two = true; }
elseif ($minute <= 40) { $three = true; }
elseif ($minute <= 50) { $four = true; }
elseif ($minute <= 59) { $five = true; }
// 1. "Hottest" Item List
if ($two || $five) {
system("/home/me/administrative_scripts/hot_item"); //runs at 20 minutes after, and 50 after the hour
}
You could combine time vars to assign other values ("five o'clock every other day") or implement a weekday variable in order to only run jobs on Tuesday (we don't do that on this server ... the longest interval without a job is 48 hours).
For practicality's sake, because the "top of the hour" is likely to have the hourly jobs running and the frequent cronjob will exit if the lockfile is in place, we tend to place most of our "few times an hour" jobs in the 20-60 minute window.
That's about it. By doing this we were able to ensure that Apache was never having to do significant contention with our background tasks running under cron(1).