Wednesday, April 04, 2007

TinyBeagle or a Lucene Example

Recently I read this interesting comment in an OSNews article. It tried to briefly summarize what beagle is. I take users' comments very seriously and this person seem to know some internal of beagle, so I thought maybe he is true (modulo some factual errors). Maybe the only new thing in beagle is the crawler, the GUI and the scheduler; its mostly little C# glue code tying up a few third party apps.

So, I wrote down a small Lucene.Net based file indexer and query program. You index by
mono LuceneLocate.exe /path/to/index/dir index /directory/to/index
and query by
mono LuceneLocate.exe /path/to/index/dir query query_term
Pretty simple program, 85 lines of actual code. Incredibly fast performance. Using external program ('cat') to index files in a directory (recursively), it indexes 180 files in 0.06 seconds. Query returning 44 results took 0.0015 seconds . Takes 24 MB virtual, 5.3 MB RSS-Shared. No GUI yet. I could have added a scheduler to pause for 10 seconds after every 10 files (5 more lines). This Lucene.Net based crawler and indexer beats beagle in performance but nowhere close to beagle.

Maybe beagle is not a lucene-powered locate. After all, to err is human.

No comments: