Thursday, December 13, 2007

Enterprise search OR How to index on-demand

If you are like me who keeps their filesystem organized, have a relatively
unchanging home directory or just simply do not want realtime indexing, you
can use beagle-build-index to meet your needs.

Beagle-build-index builds static indexes from files. Static indexes are
created and updated on demand when beagle-build-index is run but the
directories are otherwise not monitored for changes. The next time the
command is run, the changes are registered in the index. Once the static
index is created, you can ask beagled to search in it (by
passing --add-static-backend /location/of/static/index). beagled need not be
stopped while running beagle-build-index, it will automatically use the
updated index for searching once beagle-build-index finishes.

That was for files. If you want to do the same for anything else, say emails
or notes or addressbook and you do not want realtime monitoring, start
beagled normally and let the indexing finish. Then stop beagled and restart
with the parameter --disable-scheduler. Unfortunately, to update the index
with changes, beagled needs to be stopped, started normally and allowed to
run till updating of index is done, stopped and then again started with that

If you are a system administrator managing lots of users and you dont want to
run beagled in realtime indexing mode for all of them, you can use the above
procedure to create/update static indexes, say once a day.

Or if you don't like mono, you can use Recoll for files and mairix for emails.
There are probably many more such tools but these are the two I know. Just in
case you have not heard about Mono, it is an open source implementation of
ECMA standard compliant C# compiler and a Common Language Runtime. And some
more goodies, all in all pretty useful.

No comments: