objects, commonly known as BLOBs (binary large objects). There are two
obvious options, store it in the filesystem or store it in a database. And by
twisting arms we get the third obvious approach, a mix of the two. There are
some studies in database academic community on the tradeoff.
Thinking carefully, the database and filesystem work in a similar way when 
storing objects. Both store data in tree fashion, providing fast access to 
any object. Both can add or delete object. And both suffers from 
fragmentation.
The fragmentation is the key issue in my test case. So what is my test case. 
It consists of lots and lots of small files and I mean, really small files 
and some very large files. Just to give an example, I have a total of 20000 
files. Of them about,
500 are above 4K
1000 2-4K
2000 1-2K
16000 less than 1K.
There is a reason for this extremely skewed distribution. The files are 
textcaches in my home directory. The main contributing factors are about 8000 
files (a mix of source code, tex, pdf and html and oh ... lots of pictures) 
and 19000 emails (without spam). Now source codes are self caches i.e. 
separate textcache is not stored for these files. Same with txt. Pictures and 
other media has no text to cache. So the bulk of cached text comes from html, 
pdf and office documents. Most of such documents in my home directory are 
generally pretty large i.e. with a significant amount of textual content. 
This is in sharp contrast to the behaviour for emails. The textual content 
from the emails are stored. Also, unlike the html files on the disk, the HTML 
parts in these emails are small, really small. Those contribute to the huge 
number of extremely small files. If you are wondering how come I got 3000 
absurdly small 20 byte files, I store the gzipped copy of the extracted text 
in the textcache.
So now the question is how to best store data with such distribution. The 
simplest thing is to use the filesystem. Filesystem are equipped with 
sophisticated methods to avoid fragmentation. But with a fixed block size 
(like 4K on my home directory partition), there is bound to be some 
fragmentation when storing small files like above. On the other hand, 
historically databases were concerned with storing small strings (with all 
sorts of querying and update speedup; and they are pretty good at that). 
Recently, most of the databases have added support for large binary objects. 
There is also a small twist to this problem. The data is mostly static but 
there are deletions (and sometimes modifications). The last important factor 
is I am using the embedded sqlite database which stores all its data in a 
single file; so in essence, what I get is a "fast retrieval and 
update"-enabled huge tar file.
The prevailing folklore is that database is better for smaller objects and 
filesystem for larger objects. What is small and what is large depends on the 
nature of the problem at hand. There are some papers doing detailed analysis 
of the tradeoff, and they conclude somewhat along the same lines. I found one 
paper showing that adding and deleting lots of large objects over time causes 
enough fragmentation in a database that it can lag behind a filesystem.
I am not much of an experimental person myself. With the above testcase, I did 
not even think twice about what I should do. Well, for other reasons (like 
heap fragmentation in my application, since reading from a database would 
require me to store the entire data in memory) I was a bit biased towards the 
filesystem approach. So I decided to take the middle path of a mix of 
filesystem and database. Thanks to decent APIs, implementing the hybrid 
approach did not turn out to be anything complicated. I store the first 4K in 
a buffer and then if I get more data, move everything to a file otherwise 
save everything in the database.
All the 19500 files with less than 4K size are now safely sleeping in the 
database, which would have otherwise taken 4K each.
(* The title is not entirely correct. I did not have to waste 4K blocks for 
each of 19500 files, but now my sqlite database is 26MB instead of something 
much smaller. But I am confident, this is good for the mankind at large and 
for the dolphins too.)

No comments:
Post a Comment