d-Tech-t: December 2007

Friday, December 28, 2007

klik2 klik beagle

Yay! I managed to make a klik2 recipe for beagle. In principle this should enable anyone to just do
$ klik klik2://beagle
and happily run beagle. All the dependency packages will be automatically downloaded and managed in the background. Or to download once and use many times, you can do
$ klik get beagle
$ alias runbeagle='klik run ~/Desktop/beagle.cmg'
$ runbeagle

Not all of the above is happening right away; klik2 is under development and is looking promising but not completely done yet. But if you already have mono and want to run beagle, you can sort of do it now. This works for any distribution that klik2 works for, which is basically almost all the major distributions.

Get and install klik2
Download http://cs-people.bu.edu/dbera/blogdata/beagle.xml
$ klik get beagle.xml

It will do a lot of stuff and end up failing since there is no single app called "beagle" in the beagle package. It will however create the file ~/Desktop/beagle_0.3.1-2.cmg

Run beagled.

klik run beagled ~/Desktop/beagle_0.3.1-2.cmg --fg --backend manpages --backend Files

Browse to http://localhost:4000/ to access beagle using the web interface. You can use it to search, check indexing status and shutdown beagled.
To run the other beagle tools, the pattern is the same

$ klik run beagle-command ~/Desktop/beagle_0.3.1-2.cmg beagle-command-params
For example, to shutdown from command line

$ klik run beagle-shutdown ~/Desktop/beagle_0.3.1-2.cmg

The next time you want to run beagled, you need not run the recipe again; start from step-4 straight away!

Thursday, December 27, 2007

one zero-three-two sailed today

Beagle 0.3.2 was released today. On one hand we are still catching up with the regressions and new bugs that were introduced in the mighty beagle-0.3.0 and on the other hand, new features are streaming in. While yelp remains broken with beagle, I was amazed at how easily I can search within the manpages and double click on the results to open them in yelp. A much better alternative to man -K.

In other news, Lukas has started working on providing spelling suggestions Did you mean ... ? There are some technical limitations which are not fully resolved yet so it did not make it into 0.3.2 . It is currently housed in a branch and I hope to release it into the wild soon.

Beagle was not designed as an RDF store at its inception. It will take quite some work to make it a genuine RDF store. But what if there was an RDF adapter that sat between an RDF client and beagle and talked to each of them in their corresponding language, yet maintaining sanity. There is an ongoing work to overlay a Semweb selectable source on top on beagle. We will see how that goes.

A post on klik2 rekindled my desire to create a klik package for beagle. It will be easier this time since klik2 handles command line programs. I tried the automatic debian generated recipe and it mostly worked. Mostly, because one of the tools in beagle ran with --version and --help but failed to find some libraries to do anything more. I think all I need to do is to teach klik how to set certain PATHs and environment variables. Pretty exciting, what do you think ?

Thursday, December 13, 2007

Enterprise search OR How to index on-demand

If you are like me who keeps their filesystem organized, have a relatively
unchanging home directory or just simply do not want realtime indexing, you
can use beagle-build-index to meet your needs.

Beagle-build-index builds static indexes from files. Static indexes are
created and updated on demand when beagle-build-index is run but the
directories are otherwise not monitored for changes. The next time the
command is run, the changes are registered in the index. Once the static
index is created, you can ask beagled to search in it (by
passing --add-static-backend /location/of/static/index). beagled need not be
stopped while running beagle-build-index, it will automatically use the
updated index for searching once beagle-build-index finishes.

That was for files. If you want to do the same for anything else, say emails
or notes or addressbook and you do not want realtime monitoring, start
beagled normally and let the indexing finish. Then stop beagled and restart
with the parameter --disable-scheduler. Unfortunately, to update the index
with changes, beagled needs to be stopped, started normally and allowed to
run till updating of index is done, stopped and then again started with that
parameter.

If you are a system administrator managing lots of users and you dont want to
run beagled in realtime indexing mode for all of them, you can use the above
procedure to create/update static indexes, say once a day.

Or if you don't like mono, you can use Recoll for files and mairix for emails.
There are probably many more such tools but these are the two I know. Just in
case you have not heard about Mono, it is an open source implementation of
ECMA standard compliant C# compiler and a Common Language Runtime. And some
more goodies, all in all pretty useful.

Wednesday, December 12, 2007

Many reasons to like, what's yours ?

Beagle 0.3.0 was released beginning of December. It is nearly 2 years since 0.2.0, more than 10 months since the last feature release and it has been about 2 weeks since then. In the mean time we identified some problems upgrade problems with 0.3.0 and released 0.3.1 and Mono released 1.2.6.

In contrast to 0.1.0 and 0.2.0, beagle-0.3.0 did not have any single major-impact change. But there were lots of small changes, all over the summer months and the months following them. It was getting increasingly difficult to handle all the small changes without going through the "Release early" trick and at some point we paused development, did a test release and then finally released what we have as a major release. I am personally expecting a fair share of bugs and regressions.

What are these small changes anyway ? I will leave out the invisible ones, some of which I have blogged about before, and only explain the ones that will directly impact your desktop usage.

There are 3 new backends: the Thunderbird backend (newly written, much better than the earlier one), the Opera history backend and the Nautilus metadata backend. There is also the TeX filter, one of our most demanded ones and new audio filter based on Taglib-sharp. There are new Firefox and Epiphany extensions which do a lot more than indexing browsing history and bookmarks.

The UI got some love; specially a bunch of useful options were added to beagle-settings like the backend selecion list. For obvious reasons, users should disable the backends they are never going to use.

One of the side effects of the beagle textcache previously was the creation of thousands of small cache files on the disk. People reported that the external fragmentation was wasting a lot of space. The textcache module was redesigned to minimize the fragmentation; I am sure you will appreciate the recovered space. We also compacted the external attributes; besides other benefits that will save some more space.

Two major enhancements were made to the query syntax, which is already quite rich. Date queries are now possible; date queries do not make complete sense without date range query, so that too is possible. And a new "filetype:" keyword was added e.g. to search for images use "filetype:image", to search among documents use "filetype:document" etc.

The major complains against beagle are constantly high CPU load, high memory usage and improper termination (or not exiting at all). The first two are well known and oft discussed. The third problem is not directly brought up, but have been found to be the reason upon close investigation. I gained valuable experience trying to find my way through the web of signals, threads and events in beagle code; a number of key issues were spotted and fixed. Oh, and the first two issues were also dealth with, as much as we could diagnose, but that is nothing new. It will sound funny, but a few of the high CPU and memory problems are direct results of some of our decisions that backfired. Some of them were fixed and the others being worked on.

2 experimental features were also added. One is a web interface to search beagle from Firefox (gecko based browsers really). You can also create standard bookmarks for common search terms. The neat thing about this web interface unlike the earlier webservices based one is that there is no heavy weight server running on beagle's side. This one communicates with beagled using BeagleClient XML based API and builds the entire GUI on the client side; a pure Web 2.0 AJAX/XSLT/CSS webapp (ok, these are some cheap buzzwords).

The other fancy feature is searching other beagle daemons over the network. Using Avahi you can even publish your beagle daemon or discover other beagle daemons in the network. We haven't quite figured out how to handle security, authenication and some other issues. So the feature is disabled by default and marked as experimental but I believe it can be used in some innovative way.

We received request from some distributions about global config files; useful for both distributions and sysadmins. Some useful global configuration settings would be to exclude certain directory from indexing for all users, adding or removing file ignore patterns from the default list, disabling of KDE backends by default in pure Gnome distributions. Some of the options were moved from the code to the config files, so that they can be set globally and overriden by individual users.

These are only some of the major ones.

Lastly, the reason I got excited about mono-1.2.6 is because it has some fixes and improvements that will be directly visible when using beagle.

d-Tech-t