ckweb (Check Web Site Availability)

Based on the more popular Tcl language, Expect is an "obscure" language to automate console commands that insist on terminal I/O. You don't hear people talk much about this language. But when you need it, you need it badly. You can't use a here document in any UNIX shell to automate passwd, telnet or sudo input. But Expect can. Even if you can pipe input to some commands as in the case of ftp -n <<EOF, by using Expect you gain the most granular control over what you do. One project I thought I could put Expect to good use is web site monitoring. Sure, there're already many scripts out there that can do this job. But my ckweb script is unique, in that it talks HTTP natively; you can control every aspect of the communication between your Expect-emulated browser and the web server.

How does it work? As you may know, you can use telnet to talk to a web server (you type the text in bold)
$ telnet www.uh.edu 80
Trying 123.45.67.89...
Connected to www.uh.edu.
Escape character is '^]'.
GET / HTTP/1.0

HTTP/1.1 200 OK
Date: Mon, 29 Oct 2001 17:08:56 GMT
Server: Apache/1.3.9 (Unix)
Connection: close
Content-type: text/html

<html>
<head>
<title>Welcome to University of Houston</title>
...

or if you use HTTP/1.1 instead (all modern browsers use HTTP/1.1)
$ telnet www.uh.edu 80
Trying 123.45.67.89...
Connected to www.uh.edu.
Escape character is '^]'.
GET / HTTP/1.1
Host: www.uh.edu

HTTP/1.1 200 OK
...

or when you have to go through a proxy server
$ telnet myproxy.mycomp.com 8080
Trying 123.45.67.89...
Connected to myproxy.mycomp.com.
Escape character is '^]'.
GET http://www.uh.edu HTTP/1.0

HTTP/1.1 200 OK
Date: Mon, 29 Oct 2001 17:08:56 GMT
...

Expect is a language that completely emulates or automates what you type on your terminal. If you schedule a UNIX cron (or Windows AT) job, your web site can be monitored continuously. The benefit of using Expect is that as your knowledge of the HTTP protocol, I mean your understanding of RFC 2068, expands, your ckweb.exp will become more sophisticated. You don't need to wait for the author of HTTPPing or wget to add a feature you need. And Expect allows you to take any action in response to any response, or lack of, from the server, either the web server or anything in the middle (DNS, proxy, etc.). Once you know Expect, you can change ckweb.exp to a script called ckmailsvr.exp to check your mail server, or cknntp.exp to check your news server. In fact, any text based application protocol using TCP at the transport layer can use ckweb.exp as a template to remotely monitor its service, with absolutely total control up to the limit of your knowledge of its RFC.

To install, download and install Expect if you don't already have it on your system. Download ckweb.exp and its companion wrapper script ckweb.ksh. The wrapper shell script also has a feature to allow you to specify a no-check window, probably across a weekend. Note that ckweb.ksh has to use KornShell newer than Nov 88 version for a little string manipulation; check the version of your KornShell with the command what /bin/ksh or strings /bin/ksh | fgrep '@(#)' (replace /bin/ksh with whatever path your binary is at).

Now the "hard" part. Write the web site list file as follows.
#Col 1: Recipient list separated by comma (no space); check ckweb.ksh for actual emails
#Col 2: Web site short description (no colon allowed)
#Col 3: URL (no HTTP port),
#Col 4: HTTP port (usually 80)
#Col 5: HTTP return code (usually 200); specifying multiple codes separated by
#       comma means any of them is acceptable,
#Col 6: beginning day of week not to check
#Col 7: beginning time not to check
#Col 8: end day of week not to check
#Col 9: end time not to check
#Col 10: P to indicate proxy should be used to access this URL
#Colon (:) is field delimiter. # has to be at line beginning to start comment.
#Missing cols 5-8 means no no-check window; if these fields need skipped, set
#them to null e.g MySite:www.mysite.com:80:200:::::P

A,B:Our Company Internet Site:www.ourcompany.com:80:200
C:Intranet Site:my.ourcompany.com:80:302:6:1930:0:1930
A:Our HR Site:myhr.ourcompany.com/login.asp:8000:200:6:1930:0:1930
B,C:Our Primary Client:b2b.ourclient.com:8888:200,401:5:2000:1:0500:P

Hope the words in the weblist file makes sense. We entered four entries. The first site should return code 200 when ckweb telnets to port 80 and sends GET (or HEAD command if you don't care about the web page content, which you don't anyway; modify ckweb.exp to use HEAD in place of GET). The site should be up all the time; if not, recipients A and B as defined in ckweb.ksh will be notified. The second site should be guaranteed to be up any time except during the Saturday 7:30pm to Sunday 7:30pm window. When connected, the site should return code 302 ("Moved Temporarily", which really means redirected). You could further check the redirected page instead of the redirecting page if you wish. The third entry checks the URL you would enter in your Web browser http://myhr.ourcompany.com:8000/login.asp. Note that in our web list file, the port number is placed after the URL minus the port. The next version of ckweb should handle the exact URL you type in your browser. The last entry checks a Web site through your company proxy server, hence the P at the end. (Sorry, the current ckweb doesn't support proxy that requires username/password login. I'll correct it later.) The acceptable return code of their Web site could be either 200 or 401; there are some sites out there that return different codes at different times. And we only monitor the site on weekdays.

There're some fundamental limitations for ckweb. If the Web site talks HTTPS instead of HTTP protocol, you won't be able to use telnet or ckweb to communicate with it. Some programming language module has to be used. Since J2SE (Java 2 Standard Edition) comes with Web and SSL support, the easiest way without installing additional software is to use Java. ckweb also has problems going beyond regular browser navigation. Suppose in your real browser, you go to some page which launches a Java applet outside of and independent of your browser window (as in the case of Oracle JInitiator), then there's only one technology that can emulate and automate this process, i.e. grabbing your mouse and keyboard as Mercury WinRunner does. Of course WinRunner is limited to Windows and takes control over the entire foreground so you can't run multiple sessions on one computer.

Appendix

I How do you find your proxy server? If you can connect to the Internet, in most cases you can find it in your browser setting. If you use Netscape, go to Preferences | Advanced | Proxies and you'll know. In Internet Explorer, go to Internet Options | Connections | LAN Settings (if you use phone dialup to be online, then go to the corresponding dialup setting instead of LAN Settings). However, many companies advise the users to use an automatic proxy script, such as http://ourproxy:8080/proxy/proxy.pac. In this case, you shouldn't assume the proxy server is ourproxy although in most cases it is or is one of them. To find out, show the proxy.pac script on screen
$ telnet ourproxy 8080
Trying 123.45.67.89...
Connected to ourproxy.
Escape character is '^]'.
GET /proxy/proxy.pac HTTP/1.0

...
        return "PROXY px1.ourdomain.com";
...
        return "PROXY px2.ourdomain.com";
...
In the above case, the actual proxy servers are named px1 and px2. If you don't know which one to pick, read the output which is a JavaScript to see if the servers listed are for load balance (then pick any one) or conditionally redirected (then pick the one your connection belongs to). Otherwise, just try them all one at a time.

II Case Study: Extend ckweb.exp We want to monitor our Oracle 9iAS web form URL, http://site:7777/form90/f90servlet?config=ourapp. When we telnet to it and send GET /form90/f90servlet?config=ourapp, we always get 500 Internal Server Error followed by somewhat garbled Java errors. Then we add User-Agent: ... to the HTTP request header in the hope that this is the only header we're missing, but we still get the same errors. We finally use a network sniffer and see that a real browser actually sends
GET /forms90/f90servlet?config=r14 HTTP/1.1
Accept: image/gif,image/x-xbitmap, image/jpeg, image/pjpeg, application/msword, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/x-shockwave-flash, */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.0.3705; .NET CLR 1.1.4322)
Host: site:7777
Connection: Keep-Alive
After some testing, we find that in addition to GET and Host, only Accept-Language and User-Agent are required. Without either of these, we get 500 return code. So we add

  send "Accept-Language: en-us\r"
  send "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.0.3705; .NET CLR 1.1.4322)\r"
to ckweb.exp after we send GET. Now we can expect 200 return code and the monitoring script works happily.

To my Computer Page