Evaluation of Performance Improvements for BCN's Web Server
Paul von Behren
April 3, 2000
This paper describes my investigation of several alternatives for speeding
up BCN web server transactions. Some detail is provided on the various
alternatives and my approach to analysis. For the impatient, I'll start
off with conclusions:
-
I looked at two "extensions" for our web server software - mod_perl and
fastcgi; these extensions are often recommended as ways to improve performance
of CGI scripts.
-
The two extensions can speed up CGI scripts is certain cases, but probably
not on BCN. They would be most effective for new perl scripts.
-
Legacy scripts may take a lot of work to port to these environments. In
particular, the scripts need to work under a very recent release of perl
(perl 5.004 or newer)- many of BCN's scripts are using perl 4. Even if
a script works in these new environments, there may be some sloppiness
which was acceptable under CGI, which can cause problems in the new environments.
-
Of the three BCN scripts which I tried to test, I only got one script fully
working under mod_perl and none worked under fastcgi.
I believe that we can get more performance gain with less effort through
better education and habits (and some rework) rather than investing the
time it would take to install, configure, test, and troubleshoot these
new environments.
Introduction
CGI scripts are used on BCN for dynamic web capabilities - page hit counters,
mail forms, discussion boards, surveys, etc. BCN's web server software
(Apache) launches these scripts as separate, running programs..The scripts
execute in parallel with the web server, send output to the user's web
browser, then exit. It takes a lot more processing to launch the separate
program than to simply return a web page or to run the script as part of
the Apache server itself.
Most - if not all - of BCN's CGI scripts are in the Perl programming
language. A problem associated with Perl is that the time to start a Perl
program is considerably longer than the time it takes to start a program
written in C language. When a web page calls a CGI Perl program, the web
server (Apache) starts the Perl run-time engine asking it to run the CGI
script. It takes a substantial amount of time to start the Perl run-time
engine. For simple scripts, it may take longer to start the run-time than
to actually run the script.
There are several techniques available for helping the performance of
Perl programs. This paper looks at a couple of these techniques: issues
in set up, performance gains, and migration from current scripts. To do
this, I copied three commonly used CGI scripts from BCN - counter (page
hit counter), LastMod.prl (include the last modification date/time in a
web document, and emerge (automates web surveys to email). I also investigated
some other approaches to helping performance.
mod_perl
mod_perl provides a way to embed the Perl runtime in the Apache web server.
This eliminates the run-time startup costs. Mod_Perl is tightly integrated
with Apache and provides some elegant facilities for configuration and
migration from CGI. Mod_perl also provides a programming interface for
capabilities beyond CGI, but this was not a primary objective in my analysis.
Apache needs to be rebuilt with mod_perl support. It also needs some
configuration changes to support mod_perl. All existing perl CGI programs
continue to run as CGI. For my testing, I have set up a directory - similarly
to cgi-bin - strictly for mod_perl scripts. I am not sure whether mod_perl
scripts can be run under user directories. Some perl CGI scripts may run
as-is under mod_perl.
One observation about mod_perl - the documentation is poorly planned
and presented. There are a variety of informal documents included with
the mod_perl source. They are mostly written in a non-standard documentation
format. The documents do not provide all the information needed, and sometimes
contradict each other.
mod_perl script compatibility
-
•requires perl 5
-
•BCN's version of counter will not run under perl 5 as is. The Perl error
message was useless to help identify the problem; numerous people described
the problem in news groups, but no fixes or workarounds were mentioned.
I modified it so that it would run under perl 5,
-
Under conventional CGI, resources are implicitly freed when the script
ends; this is not true under mod_perl or fastcgi. There don't appear to
be any tools which help find leaks in scripts or detect them while running.
fastcgi
fastcgi uses a co-processing model; an Apache add-on module communicates
via Interprocess Communications (IPC) to a persistent perl server. Unlike
mod_perl, CGI scripts need substantial modification to run under fastcgi.
Even when I followed the recommended steps, I could not get any of 3 CGI
scripts ported to fastcgi. I did create a CGI script that did the same
things as a (modified) fastcgi demo scrtipt. This script simply echoed
out the values of CGI variables. The CGI version ran faster than the fastcgi
version. I gave up testing fastcgi after this.
As bad as mod_perl's documentation is, fastcgi's is even worse.
Performance Tests
As mentioned above, the performance issues with Perl CGI scripts are
associated with startup time. Once the script content starts executing,
the performance should be identical with or without mod_perl. With this
in mind, I set up tests running short scripts, many times. To avoid disruption
to BCN services, I installed mod_perl on a Linux system at home. To test,
I wrote a Java program to measure the time to issue and process a large
number of web requests.
The actual performance improvement with mod_perl is heavily dependent
on the type of work done in the script. The following table lists timings
for two scripts used on BCN.
Time to process 200 requests to "counter" and lastMod
scripts |
|
CGI under SSI |
mod_perl |
counter |
776 milliseconds |
669 milliseconds |
LastMod |
1504 milliseconds |
1647 milliseconds* |
Counter's execution includes several I/O requests, it locks the counter
file, reads it, updates one line, rewrites it, then unlocks it. In this
case, mod_perl contributes little performance gain. I expected LastMod
- a CGI script which displays the last date a file was modified - to do
well under mod_perl. Not only is the performance slightly worse, but, the
date returned from lastMod under mod_perl was off by a week.
"Low Tech" Alternatives
There are alternative ways of doing some of the work currently done by
CGI scripts on BCN. The most notable is displaying the last date or time
a file was modified. Here's some alternatives for determining the last
modification time:
-
•Server Side Includes (SSIs) are special HTML commands to BCN's web server.
The LastMod script is typically called as a command (even allows calls
to non-CGI programs). SSI also allows a call to a CGI program. This requires
changing the LastMod script adding a "Content-type" header. This gives
us SSI-CMD and SSI-CGI approaches.
-
•SSI also has built-in support for file modification times (the flastmod
command). In the early versions of SSI, the date/time formatting was limited
- causing people to create scripts like LastMod. Formatting is now pretty
flexible.
-
•I also determined performance using mod_perl - even though it did produce
bad information (the wrong file modification date).
-
•Javascript includes a command to get the current document's modification
time. There are trade-offs in using Javascript (not all browsers support
it), but it is an alternative.
-
•I'm not sure that web page viewers generally care about last modification
times. The people responsible for the pages care, but other facilities
can be used to determine file modifications dates as needed. I ran a test
by simply removing all commands to CGI/SSI/Javascript/...
-
•Well-behaved web servers are supposed to include page modification information
as part of the hidden headers sent with each page. BCN's web server is
configured to treat each page as an SSI page - which prevents it from returning
this header. I configured my server to parse .shtml pages and not parse
.html pages. The non-parsed pages perform a bit better and allow interested
users to view the modification information with their browser's "view page
info" command. Unfortunately, disabling parsing on BCN will require a fair
amount of work to change pages which do require SSI.
The following table gives the number of milliseconds to complete 2000 requests
to pages using each of these approaches to get the file modification time.
SSI-CMD |
54925 |
SSI-CGI |
40154 |
mod_perl cgi |
13600 |
SSI-fmodlast |
8794 |
JavaScript |
8230 |
none |
7914 |
no parse |
7267 |
As you can see in the table, mod_perl performs better than than the
SSI "cmd" and "cgi" methods, but not as well as the other alternatives.
Along similar lines, the counter scripts uses a single system-wide file
for access counts. The script sets a lock, modifies the count, then unlocks
whenever anyone selects any BCN page using this hit counter. Other users
referencing a page with a hit counter must wait while the counter is locked.
Although it takes some setup, the counter script can be set up to use different
access count files.
Recommendations
I will write up a proposal to modify the BCN developers resource page
describing the recommended way to get a file's modification date using
SSI fmodlast. I will also add text to that page explaining how to access
analog files and recommending that hit counters be avoided.
I will investigate the counter script some more and see if the script
can installed once and always the "current" directory for the access counter
file. If not, the we need to consider whether we want the administrative
work of installing counter separately for each site that uses it.
I will look into implementing performance improvements in the BASIN
pages.
We may want a TAC discussion of whether we should migrate towards only
parsing .shtml pages rather than .html pages.