Evaluation of Performance Improvements for BCN's Web Server

Paul von Behren

April 3, 2000

This paper describes my investigation of several alternatives for speeding up BCN web server transactions. Some detail is provided on the various alternatives and my approach to analysis. For the impatient, I'll start off with conclusions:

I believe that we can get more performance gain with less effort through better education and habits (and some rework) rather than investing the time it would take to install, configure, test, and troubleshoot these new environments.

Introduction

CGI scripts are used on BCN for dynamic web capabilities - page hit counters, mail forms, discussion boards, surveys, etc. BCN's web server software (Apache) launches these scripts as separate, running programs..The scripts execute in parallel with the web server, send output to the user's web browser, then exit. It takes a lot more processing to launch the separate program than to simply return a web page or to run the script as part of the Apache server itself.

Most - if not all - of BCN's CGI scripts are in the Perl programming language. A problem associated with Perl is that the time to start a Perl program is considerably longer than the time it takes to start a program written in C language. When a web page calls a CGI Perl program, the web server (Apache) starts the Perl run-time engine asking it to run the CGI script. It takes a substantial amount of time to start the Perl run-time engine. For simple scripts, it may take longer to start the run-time than to actually run the script.

There are several techniques available for helping the performance of Perl programs. This paper looks at a couple of these techniques: issues in set up, performance gains, and migration from current scripts. To do this, I copied three commonly used CGI scripts from BCN - counter (page hit counter), LastMod.prl (include the last modification date/time in a web document, and emerge (automates web surveys to email). I also investigated some other approaches to helping performance.

mod_perl

mod_perl provides a way to embed the Perl runtime in the Apache web server. This eliminates the run-time startup costs. Mod_Perl is tightly integrated with Apache and provides some elegant facilities for configuration and migration from CGI. Mod_perl also provides a programming interface for capabilities beyond CGI, but this was not a primary objective in my analysis.

Apache needs to be rebuilt with mod_perl support. It also needs some configuration changes to support mod_perl. All existing perl CGI programs continue to run as CGI. For my testing, I have set up a directory - similarly to cgi-bin - strictly for mod_perl scripts. I am not sure whether mod_perl scripts can be run under user directories. Some perl CGI scripts may run as-is under mod_perl.

One observation about mod_perl - the documentation is poorly planned and presented. There are a variety of informal documents included with the mod_perl source. They are mostly written in a non-standard documentation format. The documents do not provide all the information needed, and sometimes contradict each other.

mod_perl script compatibility

fastcgi
fastcgi uses a co-processing model; an Apache add-on module communicates via Interprocess Communications (IPC) to a persistent perl server. Unlike mod_perl, CGI scripts need substantial modification to run under fastcgi. Even when I followed the recommended steps, I could not get any of 3 CGI scripts ported to fastcgi. I did create a CGI script that did the same things as a (modified) fastcgi demo scrtipt. This script simply echoed out the values of CGI variables. The CGI version ran faster than the fastcgi version. I gave up testing fastcgi after this.

As bad as mod_perl's documentation is, fastcgi's is even worse.

Performance Tests
As mentioned above, the performance issues with Perl CGI scripts are associated with startup time. Once the script content starts executing, the performance should be identical with or without mod_perl. With this in mind, I set up tests running short scripts, many times. To avoid disruption to BCN services, I installed mod_perl on a Linux system at home. To test, I wrote a Java program to measure the time to issue and process a large number of web requests.

The actual performance improvement with mod_perl is heavily dependent on the type of work done in the script. The following table lists timings for two scripts used on BCN.
 

Time to process 200 requests to "counter" and lastMod scripts
CGI under SSI mod_perl
counter 776 milliseconds 669 milliseconds
LastMod 1504 milliseconds 1647 milliseconds*

Counter's execution includes several I/O requests, it locks the counter file, reads it, updates one line, rewrites it, then unlocks it. In this case, mod_perl contributes little performance gain. I expected LastMod - a CGI script which displays the last date a file was modified - to do well under mod_perl. Not only is the performance slightly worse, but, the date returned from lastMod under mod_perl was off by a week.

"Low Tech" Alternatives

There are alternative ways of doing some of the work currently done by CGI scripts on BCN. The most notable is displaying the last date or time a file was modified. Here's some alternatives for determining the last modification time: The following table gives the number of milliseconds to complete 2000 requests to pages using each of these approaches to get the file modification time.
 
SSI-CMD 54925
SSI-CGI 40154
mod_perl cgi 13600
SSI-fmodlast 8794
JavaScript 8230
none 7914
no parse 7267

As you can see in the table, mod_perl performs better than than the SSI "cmd" and "cgi" methods, but not as well as the other alternatives.

Along similar lines, the counter scripts uses a single system-wide file for access counts. The script sets a lock, modifies the count, then unlocks whenever anyone selects any BCN page using this hit counter. Other users referencing a page with a hit counter must wait while the counter is locked. Although it takes some setup, the counter script can be set up to use different access count files.

Recommendations
I will write up a proposal to modify the BCN developers resource page describing the recommended way to get a file's modification date using SSI fmodlast. I will also add text to that page explaining how to access analog files and recommending that hit counters be avoided.

I will investigate the counter script some more and see if the script can installed once and always the "current" directory for the access counter file. If not, the we need to consider whether we want the administrative work of installing counter separately for each site that uses it.

I will look into implementing performance improvements in the BASIN pages.

We may want a TAC discussion of whether we should migrate towards only parsing .shtml pages rather than .html pages.