Cron Jobs: Best Practice and Troubleshooting

Cron Jobs: Best Practice and Troubleshooting

UNIX cron dates back to late 1970's according to Wikipedia. Its maturity and robustness are unmatched by any other scheduling facility. This short article discusses aspects of scheduling a cron job that are not obvious to common users and suggests best practices. It assumes Linux as the operating system, unless otherwise noted.

1. Cron environment

By far the most common problems with a cron job are related to its minimal environment setting. You can see the cron job environment after running this job

<time> /usr/bin/env > /tmp/env.out

where env.out may only have these lines:

SHELL=/bin/sh
USER=oracle
PATH=/usr/bin:/bin
PWD=/home/oracle
SHLVL=1
HOME=/home/oracle
LOGNAME=oracle
_=/usr/bin/env

The fastest way to solve the problem that a script runs fine on the shell command line but not as a cron job is to compare the above ouput with your shell `env' output (for easier comparison either visually or by `diff', sort both results). Then add one or a few environment variables at a time to a command line command starting with an empty environment, e.g.,

$ env - PATH=/bin:/usr/bin:/u01/app/oracle/product/10.2.0/db/bin ORACLE_HOME=/u01/app/oracle/product/10.2.0/db /path/script

The env - strips off all environment variables. The syntax VAR=value command is Bourne-shell or descendant convention: command is run with environment variable VAR set to value (plus other environment variables if any, inherited from its parent i.e. your shell in case of a command on cmdline, or from crond and /etc/passwd in case of a cron job). Here's one real example. A script sorted a list of strings. The cron job running this script always sorted the strings differently from running the script on command line. With the approach of adding a few environment variables at a time, the cause was identified to be a missing LANG=en_US.UTF-8 setting in the cron environment.

Although not the fastest way, another approach is to go in the opposite direction: gradually add the shell environment variables to the cron job, either into the script or to crontab. If the latter, use the syntax: <time> VAR1=value1 VAR2=value2 /path/script.

Incidentally, unrelated to cron jobs but worth a cautionary note. Some programs are picky about the value of an environment variable. Oracle, for instance, usually runs with the environment variable ORACLE_HOME set to /u01/.../db. A cron job that interfaces with Oracle should not have this variable set to /u01/.../db/, or /u01//.../db, even though cd to either of the two latter directories on UNIX/Linux lands you in the same directory.

2. Redirecting output

Many people setting up a cron job either forget redirecting stdout and stderr or explicitly throw them away, as in this crontab

<time> /path/script_with_no_output_redirect
<time> /path/script_with_no_output > /dev/null 2>&1

The first job will send stdout and stderr (if any) of the script to the user's mail account. The second will dump them to the bit bucket. This practice is not recommended because when you have a problem, it's not easy to check emails on the host (not in your Microsoft Outlook!), plus inconvenience in managing the emails. It's even worse to throw away the precious debugging info altogether. I suggest keeping the output in a file or two, even if most of the time the job runs correctly.

<time> /path/script1 > /tmp/script1.out 2>&1 #if your shell is not too old, can also be: &> /tmp/script1.out
<time> /path/script2 > /tmp/script2.out 2>/tmp/script2.err

Since the output file is truncated every time the job runs, you don't need to worry about file size increase. (You do need to handle that if you need to preserve old run records, by rotating the output or appending, i.e. >>, to the file.) For security reason, the output file should not have sensitive information such as password, or need be protected by an explicit chmod in your script or crontab entry, or by a restrictive umask.

Redirecting output this way does not replace the output redirect explicitly programmed in your script. This cron job output redirect is a catch-all of any remaining "fallout". The output is likely zero in size in normal runs. But when it's not zero, the information in there could be extremely helpful in troubleshooting.

3. Limitations of cron

The minimum interval for cron job execution is a minute. If you must run a job more frequently than that, simply create a loop in your own script with a short sleep in between. If the script is not to be modified for any reason, schedule multiple jobs, each, except one, running a command like /bin/sleep ; /path/script.

As with any host-based job scheduler, there's no easy way to coordinate job executions between different hosts using cron jobs so that job 2 on host 2 runs immediately after job 1 on host 1 completes. If such immediate continuation is required, a host running job scheduling software is preferred. Otherwise, either merge all the actions in scripts that run on one host which include making remote shell calls, or some means of notification across hosts is needed, including use of email forward mechanism with pipe to command as recipient (| cmd in ~/.forward), if the host is configured to receive emails from another host.

While crontab is flexible to allow for many types of schedules, some scheduling requirements are awkward in implementation, such as "first Friday":

* * * * 5 [ `date +\%e` -lt 8 ] && /path/script

In fact, the logic is not handled by cron, but by the command; of course you could also put the logic in your script itself. Although Wikipedia cron page lists characters such as L, W, #, which seem to implement such schedules, they're not widely supported.

In one real example, a group of servers will be automatically rebooted at 6AM of the third Friday after the second Tuesday of each month, after system admins' patch work during the day. To avoid sending an alert to the on-call at this early hour, a cron job to start a short blackout of Oracle agent is installed as follows:

59 5 * * 5 /path/agent_blackout.sh > /tmp/agent_blackout.out 2>&1

where agent_blackout.sh has these lines near the top:

#(( $(date +"%w" -d "17 days ago") != 2 )) && exit #17 days ago, it was not Tuesday. Can be omitted if scheduled for Fri only
(( $(date +"%e" -d "17 days ago") < 8 )) && exit  #If 17 days ago it was not between 8th and
(( $(date +"%e" -d "17 days ago") > 14 )) && exit #14th, it would not be the second Tues.

Another group of servers are to be rebooted on the fifth Tuesday of each month (which may fall in the following month), and have this cron job:

59 5 * * 2 /path/agent_blackout.sh > /tmp/agent_blackout.out 2>&1

with these lines in agent_blackout.sh:

(( $(date +"%e" -d "21 days ago") < 8 )) && exit  #If 21 days ago it was not between 8th and
(( $(date +"%e" -d "21 days ago") > 14 )) && exit #14th, it would not be the second Tues.

4. Caution

Command `crontab -r' removes crontab. Due to close proximity of 'r' and 'e' keys on the standard keyboard, there's some risk in accidentally removing the crontab instead of editing it. It may be good practice to periodically, even with a cron job, to save crontab entries to a file.

Troubleshooting certain problems requires root access. If a job is found to have not run in the past cycle, and the information in its own log or its stdout and stderr does not help or is empty, /var/log/cron (or cron.*) should be consulted. In one incident we had, the user's password was inadvertently left to expire in 90 days. When the password expired on one night, all jobs stopped running until the next morning when the password issue was resolved. Errors in /var/log/cron revealed the root cause.

If you have trouble with the five time and date fields in cron tab, a good website to check their meanings is crontab guru where you can put in numbers or expressions and see the interpretation. The five fields are mostly AND'ed but sometimes OR'ed. Specifically, the 5th field, day of week, and the 3rd field, day of month, are OR'ed. Thus, if today is 28th and is Wednesday, both the two schedules below will run today

* * 28 * 1 /script  #runs on 28th of every month and also runs on each Monday 
* * 27 * 3 /script  #runs on 27th of every month and also runs on each Wednesday

in addition to the obvious ones

* * 28 * * /script
* * * * 3 /script
* * 28 * 3 /script

If you do need a schedule for 28th which must also be a Wednesday, some programming logic is needed.

Step values e.g. */number for day and month start with 1 instead of 0 because there is no 0th day or month, unlike second, minute or day of week (0 means Sunday). For example, * * */5 * * means every minute on the 1st, 6th, 11th, 16th,... of every month, and * * * */5 * means every minute in January, June, November.

Some implementations of cron require that there be no blank line in crontab, or even more strangely but rarely, there be a blank line at the end. They seem to be issues in older versions of cron.

If you temporarily change crontab and intend to change it back later, make absolutely sure you remember to change it back at a later time. To avoid such grave consequences, use a personal calendar to remind yourself, perhaps more than once.

2012-2013

First published in IOUG Tips & Best Practices Booklet, 2013 ed. (but missing the second half of section 3 "Limitations of cron").

To my Computer Page