Sending large pages of HTML is no longer a problem now that we can send the pages compressed !
After two weeks of non-stop programming, your web application is ready and tested. Everything is A-OK and the grins on your clients faces remind you in a strange way of "Jaws". Except for one thing. Someone in the back timidly asks if something could be done to speed up that search result page which contains so much text. At that instant you know you should have brought a full copy of the application with you instead of demonstrating it over your 33.6kb modem line at work.
But all is not lost. There are ways to reduce the amount of data that you need to send to the client, and I'm not talking about transmitting less information but rather about sending compressed data. Sounds interesting? Read on and learn how.
Frivolous assumptions
Since this article describes functionality and techniques that will add to the complexity of a web application, it's assumed you already know how to create a web application, specifically an ISAPI dll, as well as how such an application works. As such I will skip over some details but rest assured - I will present you with all the options and code needed specifically for the techniques we're implementing here.
With that in mind, let's go squeeze some more juice out of your good ol' Internet line.
How is the magic done?
How is this possible you might ask? You've probably at one time or another downloaded a compressed file only to find that your browser somehow interpreted the data as either a web page or text and displayed this on your screen in all its glory. Not a pretty picture to say the least. If you were to compress the data before sending it to the client, wouldn't it look just as strange then? Not quite, only make sure the client knows that it should handle it differently.
The secret is hidden both inside the data the client sends to the webserver and inside the data the webserver responds with. It's called Content Encoding. In short, you can encode the data your application returns to the client, and the only precaution you need to take is make sure the client knows how to handle the data in the encoding-format you choose. This in turn is simple since the client tells you what formats it can handle when it sends the request to the server.
So what it all boils down to, is that you need to take the following steps if you want to encode the data returned to the client:
Check whether the client can handle the encoding type you want to use
Encode the data in the chosen format
Return the newly encoded data and tell the client what format you've encoded it in
What format should I use?
We are interested in compressing the data that we return to the client. There is an encoding type specifically for this purpose, and it's name is "deflate". The compression algorithm used in the deflate encoding type corresponds to the algorithm that the zLib compression library implements. You can read more about this library here: http://www.cdrom.com/pub/infozip/zlib or check the rfc describing the algorithm and it's binary format here: http://www.funet.fi/pub/doc/rfc/rfc1950.txt.
Although you might think that you already have the files needed for using the zLib compression library - you don't! At least not exactly. Even though the Delphi installation CD comes with a copy of the zLib compression library in the form of precompiled object files and some import files, they hide the details we need to use. More on that later, but for now let?s suffice to say we need a better interface to the library and for that I have chosen to supply you with my own zLib import unit and a link to the downloadable precompiled dll: http://www.winimage.com/zLibDll/.
As for the 'deflate' encoding type only Microsoft Internet Explorer appears to handle it and again only the later versions (version 4 and up handles it for sure, anything below that is unsure). This is not a big problem however since the other browsers like Netscape, don't say they can handle the compression encoding type. In this case our web application simply wouldn't return with compressed data. The only difference would be a little longer to download the data to the client. This is no worse than what we have today so I think we can live with that.
Ok, I got the files, now what?
Now it's time to get down to the gory details. Let's get off to a good start by creating a new ISAPI project in Delphi 5 and see where that takes us. You should add the downloaded import unit to this project as well. The dll you just downloaded can be put either in the C:\Winnt\System32 directory (or your corresponding directory) or in the same directory as your web application.
After creating the new project let's add some action to it, literally. Add an action to the web module and create an empty event handler for it. Make the action the default action as well since this is just a demo application for trying out our new way of returning data.
We now have an empty action event handler so let?s add some code to it that will make it do what we need. I'll show the complete event handler first and then I'll go through the details.
procedure TWebModule1.WebModule1WebActionItem1Action(Sender: TObject; Request: TWebRequest; Response: TWebResponse; var
Handled: Boolean);
var
PlaintextStream : TStream;
CompressedStream : TStream;
begin
if ( ClientAcceptsDeflate( Request ) ) then
begin
// 1. First, create temporary stream with the data to return
PlaintextStream := TStringStream.Create( 'This text is compressed' );
try
// 2. Second, create temporary stream for our compressed data
CompressedStream := TMemoryStream.Create;
try
// 3. Now compress the stream...
zLibCompressStream( PlaintextStream, CompressedStream );
// ... and return it
CompressedStream.Position := 0;
Response.ContentStream := CompressedStream;
except
FreeAndNil( CompressedStream );
raise;
end; // try except - avoid memory leaks
finally
// 4. Finally tidy up temporary object
FreeAndNil( PlaintextStream );
end; // try finally - destroy plaintext stream object
Response.ContentType := 'text/plain';
Response.ContentEncoding := 'deflate';
Response.StatusCode := 200;
Handled := True;
end // if client accepts compressed data
else begin
Response.Content := 'Not compressed';
Response.ContentType := 'text/plain';
Response.StatusCode := 200;
Handled := True;
end; // if client does not accept compressed data
end; // procedure TWebModule1.WebModule1ActionItem1Action
The primary if-statement here determines whether or not the client can handle the compressed data and then sends either compressed or uncompressed data back to the client accordingly. The uncompressed data is handled as you have always handled data in a web application so we won't discuss that further. Instead we're going to concentrate on the if-then part of the if-statement that handles compressed data. You probably noticed that we're using two new procedures/functions here, namely ClientAcceptsDeflate and zLibCompressStream. I will go through those later in this article.
Assuming we got a procedure that takes one stream as input, compresses the data this stream holds and writes the compressed data to a stream as output, we can describe the code shown above like this:
First create a temporary stream containing whatever we want to return to the client
Second, compress this data and put the compressed data to a new stream
This new stream, holding our compressed data, we simply return to the client
Finally, we tidy up our temporary objects
You can find the matching points of this list in the numbered comments of the above event handler. It's pretty basic code, and it ought to be too since we've hidden the gory details in two functions, which we'll discuss next.
One thing to note is that once we assign the ContentStream property of the Response object to our stream the response-bject takes ownership of the stream. Once the response data has been sent to the client the stream will be freed for us so we must make sure we don't accidentally free it ourselves. In the case of an exception however I make the assumption that the assignment went haywire and thus free up the stream before propagating the exception higher up.
Parlez-vous français?
To determine whether the client knows how to handle compressed data we have to take a look at the data it sends us in the first place. A typical web request looks like this (fake request, so the details might not be 100% correct):
GET /index.html HTTP/1.0
Accept-Types: */*
Accept-Encoding: gzip, deflate
User-Agent: Mozilla 4.0 (Microsoft Internet Explorer 5.0 Compatible; NT)
What we're interested in is the line that goes Accept-Encoding: gzip, deflate. It tells us what encoding types the client is able to accept, and in this case it can accept data that is encoded in the gzip format as well as the deflate format. The latter is the one we need, so let's see how to obtain that knowledge from within our web application. The function looks like this:
The function we need to write looks like this:
function ClientAcceptsDeflate( const Request: TWebRequest ): Boolean;
var
EncodingTypes : string;
begin
// Get and reformat list of encoding types from the request
EncodingTypes := Request.GetFieldByName( 'HTTP_ACCEPT_ENCODING' );
EncodingTypes := UpperCase( StringReplace( EncodingTypes, ',', '/', [ rfReplaceAll ] ) );
EncodingTypes := '/' + StringReplace( EncodingTypes, ' ', '', [ rfReplaceAll ] ) + '/';
// Return the flag
Result := ( Pos( '/DEFLATE/', EncodingTypes ) > 0 );
end; // function ClientAcceptsDeflate
In short I reformat the values gzip, deflate into /GZIP/DEFLATE/ and then check to see if the string /DEFLATE/ is found within it. If you're interested in knowing what other fields can be found in the request then I suggest you take a look at http://msdn.microsoft.com/library/psdk/iisref/isre504l.htm and use the ALL_HTTP variable to check what variables the client actually sends.
Naturellement, parlons!
After we've determined that the client can indeed handle compressed data all we've got left to do is actually produce the compressed data and this is where the magic enters.
As stated earlier, we will use the zLib compression library to do the actually compressing. The code involves the following steps:
Set up buffers for feeding data to the engine as well as accept compressed data from it
Initialize the compression engine
Feed plain text data into the input buffer from the input stream
Compress the input buffer to the output buffer
Write data from the output buffer to the output stream
Repeat steps 3-5 until no more data in input stream and buffers have been emptied
Close compression engine
Let's dig into the details and see what we have to deal with:
procedure zLibCompressStream( const Source, Destination: TStream );
var
z_s : z_stream;
rc : Integer;
// 1. Buffers for input and output
SourceBuffer : array[ 0..BufferSize-1 ] of Byte;
DestinationBuffer : array[ 0..BufferSize-1 ] of Byte;
begin
// 2. Prepare the zLib data record
z_init_zstream( z_s );
z_s.next_in := @SourceBuffer;
z_s.next_out := @DestinationBuffer;
z_s.avail_out := BufferSize;
// 2. Initialize the compression engine
deflateInit2( z_s, Z_BEST_COMPRESSION, Z_DEFLATED, -15, 9, Z_DEFAULT_STRATEGY );
// Now compress the stream
try
repeat
// 3. See if we got to feed more data to the compression engine
if ( z_s.avail_in = 0 ) and ( Source.Position < Source.Size ) then
begin
z_s.next_in := @SourceBuffer;
z_s.avail_in := Source.Read( SourceBuffer, Buffersize );
end; // if input data completely depleted
// 4. Compress the data
if ( z_s.avail_in = 0 ) then
rc := deflate( z_s, Z_FINISH )
else
rc := deflate( z_s, Z_STREAM_END );
// 5. Check if we got compressed data to write to the destination
if ( z_s.avail_out = 0 ) or ( rc = Z_STREAM_END ) then
begin
Destination.WriteBuffer( DestinationBuffer, BufferSize - z_s.avail_out );
z_s.avail_out := BufferSize;
z_s.next_out := @DestinationBuffer;
end; // if got data available for writing
// 6. Repeat until buffers exhausted
until ( rc <> Z_OK ) or ( ( rc = Z_STREAM_END ) and ( z_s.avail_out = BufferSize ) and ( z_s.avail_in = 0 ) );
finally
// 7. Clean up the engine data
deflateEnd( z_s );
end; // try finally - clean up after engine
end; // procedure zLibCompressStream
As before, you can match the points from this list with the numbered comments above. The reason we could not use the zLib code that?s included with Delphi is that it hides the deflateInit2 routine and the necessary parameters inside the implementation part of the unit as well as not exposing all the necessary code.
In order to produce compressed data in manner that the browser can handle, we need to compress the data with no header record. The header record is a small record of information that is written to the very start of the compressed data and helps the decompression engine know how much data that follows. We can opt to not write this header record by passing a negative value for the wBitSize parameter to the deflateInit2 procedure. Since the deflate standard that the browsers adhere to, does not expect nor knows how to handle this header, we have to filter it out. Since we could not call deflateInit2 directly with the zLib code that came with Delphi we had to resort to a full dll copy of the compression library.
The compression engine is capable of compressing data from the input buffer and write it to the output buffer. When the output buffer is full, our code need to flush this buffer and write the data in it to the destination, in our case a stream. When it has managed to compress all data from the input buffers, our code needs to fill up the buffer again with as much data as possible. The compression engine takes care of the rest.
Testing it
After compiling your web application (see bottom of article for a copy of the example project implemented in this article) you should ideally test it with both a browser that handles compressed data and with one that doesn't. You can use Internet Explorer 4/5 as the former and Netscape 4.06 as the latter. The browser that handles compression should show the text 'This text is compressed' and the other one 'Not compressed' for verification.
Average compression ratio on text-based content is approximately 5-6 times (15-20% of the original size) so the effect should be clearly noticeable on large web pages.
Wrapping it up
Well, that's it. With the code and knowledge contained in this article you should now be able to deal with compressed data from your web application. Even though we created an ISAPI dll in this article, the theory and code should remain the same also for CGI and NSAPI applications.
I've taken the liberty to create a unit with the two functions described above, as well as a copy of the example produced in this article. You can download the files from the list below. If there are any suggestions or things you would like to comment on I can be reached at lasse@cintra.no.
Files for download:
zLib dll (from authors website)
Import unit for zLib.dll
Example project (including unit with the two functions as well as import unit)
Only the unit with the two functions we wrote
There are a couple of finishing notes to bear in mind:
The compression engine does not determine whether the data lends itself easily to compression or not before it starts chewing on it. This means that it is possible to feed data through it that cannot be compressed and might even increase in size instead. For text and web pages this is not a problem however, but I would do some tests before feeding jpegs or gifs into it.
The compression is done server-side before it's sent so if the client is trying to download a very large web page then essentially the web application loads the entire page into memory, compresses it and sends it. If memory consumption on the server-side is a problem then I would suggest implementing the compression code in a TStream-derived class that compresses when you read from it. That way compression is done on-the-fly as the data is sent and can be fed directly off the disk through the compression library to the client. Classes for doing this are available at my homepage in the package called StreamFilter.