Compressing Serialized Objects in GET Parameters
Compressing Serialized Objects in GET Parameters
Many web applications face the common problem of parsing GET parameters to render a page. Most often, those parameters drive some sort of database operation that delivers the correct records or view; a process that might include pagination, searching, sorting or programmatic transformations of one kind or another. Regardless of whether you are using tools like Doctrine, ActiveRecord or the like to simplify and standardize this access, there is inevitably a step of sifting through parameters, sanitizing them, and structuring them to be compatible with the programming language that is ultimately responsible for the real maneuvering. It's a little like the object relational impedance mismatch applied to the structure of the Web vs. the structure of applications. If you have one or two parameters this is of little consequence, but sometimes it is useful to maintain more complete and structured information.
As an example, Firefly's Apollo Framework includes the extremely useful FilterDirectory class. FilterDirectory takes a database tablename as a constructor argument and then gives you complete access to all table fields (and even all fields tied by foreign key relationships) as variables within methods like setRepeatingRegion(). FilterDirectory also natively supports advanced filter conditions and pagination. Hence, a fairly complex set of conditions can potentially exist to fully describe all of the filters being applied to a particular page.
|Figure 1: An example of Apollo's FilterDirectory in action.|
Wouldn't it be nice if we could just serialize the server-side objects and arrays to a simple string that could be used as a GET parameter? Then a simple operation on the server could unspool the URL into all of the relevant structured data you could ever need, arrays, objects and all.
The problem, of course, is that GET requests do not offer unlimited space. On the low end, modern browsers support at least 2,000 characters, however, there are times when even that modest size can be problematic. The security extension Suhosin, for instance, will, by default, limit your GET parameters to only 512 characters.
Now, you might be tempted to simply shuffle your structured data into the active session, which, after all, easily supports your native data-structures. So too could you push serialized data into the much larger space available via POST. But both of these paths will lead to trouble. If the URL itself does not contain the necessary information to reconstruct the page, the application has failed to adhere to the basic structure of the Web - such a page can not linked to by users or search-engines! Regardless of pagination or filter conditions, the URL must be able to uniquely reproduce a page for any user at any time.
It was with these thoughts in mind that I set out to explore some basic ways of serializing some simple associative arrays and storing them in the query string with minimal cruft. The same approach should work just as well for actual objects.
We start with some straightforward data in PHP, as might correspond to a FilterDirectory object or two.
// Sample Data $data = array(); $data['directory-id'] = 314159; $data['page'] = 4; $data['rpp'] = 10; $filters = array(); $filters['table'] = 'products'; $filters['field'] = 'title'; $filters['search-value'] = 'Red'; $filters['table'] = 'photos'; $filters['field'] = 'location'; $filters['search-value'] = 'New York'; $filters['table'] = 'photos'; $filters['field'] = 'resoution'; $filters['search-value'] = 'hi'; $data['filters'] = $filters; $data['directory-id'] = 271828; $data['page'] = 22; $data['rpp'] = 5;
Serializing straight to JSON produces a string 277 characters long. Not bad! However, JSON is far from URL safe. Here are the results of some other operations:
- JSON Length: 277
- Urlencode JSON Length: 465
- PHP Serialize Length: 443
- JSON Packaged Length: 252
- Urlencode Base64 JSON Packed Length: 336
- GZIP JSON Length: 158
- Urlencode Gzip JSON Length: 414
As you can see, JSON is the serialization of choice, but getting the output to be URL safe can cause the string length to become surprisingly large. Simply calling urlencode on the JSON adds 68% to the size of the original string - as it turns out, all of those escaped character codes are pretty bloated.
The best solution, adding only 21% to the string length, is to base64 encode a packed copy of the JSON. To pack the JSON we used JSONH, a tool that essentially reorganizes your key/value structure to be less repetitive than pure JSON. I would imagine even more capable packing algorithms may exist. base64 then appears to succeed by avoiding the weight of all those percents and hex codes!
The next step is to benchmark many different scenarios and see how these ratios hold up over a statistically significant sample size. But regardless of the chosen method, it looks like Suhosin's 512 character limit may be a challenge for any attempt at housing serialization within GET parameters.
Another potential solution would be to store a small identifier in the database that points to a complex serialization (or any level of caching), but the expense of the extra connections would need to be carefully considered.