d-Tech-t: February 2007

Monday, February 19, 2007

kBeagleBar is alive!

I did not guess that people still use kBeagleBar, there are packages built for it and there was even an article about it and other KDE beagle search tools. Nice!

Faceless Bugs and Advanced Users

These are really two very different topics but they came to my mind while reading about Linus' Gnome patches and bugs.

The first is about creating a new account when I need to report a bug or submit a patch for some software. Most of the projects prefer attaching to bugzilla or they send it to their member only mailing list. I am extremely reluctant to create new accounts, so I have created bugzilla and mailing list accounts for KDE and Gnome. That covers a lot of ground. But still now and then I face a need to send something to somewhere else and bam! Sign up for an account sir! There is definitely merit in this approach, since otherwise bugzilla and mailing lists would be flooded with spam. But it definitely keeps me from submitting patches or commenting on something due my lack of interest in new accounts. Last week, a friend of mine (the inventor of Sperner's Game) was trying to install Kubuntu in his brand new Lenovo T60 when he spotted some typos in the installation windows. He was ready and willing to file a bug in Kubuntu and was told to create a new account for kubuntu bugzilla. As always, he was supposed to get a confirmation email.

The email came 12 hours later and I do not know if the bug was ever filed! Even if the email was prompt, the desire to report a bug has to be high enough to cross these technical potential barriers. *sigh*

This week I made extensive addition to beagle query syntax. There is an open bug in bugzilla asking for a visual way to add these advanced query expressions in beagle-search. I was thinking how best to achieve that; it is not easy to capture the power of beagle query expressions in a gui. I found the answer while reading some posts in desktop-architect mailing list about Linus' patch. There is nothing like an expert user or a novice user. Users always try to act as if they are smart and take the path of the expert user. Presenting different set of options for these different class of users does not work in practice.

In a similar style, there is no need for a GUI for advanced query expressions. Novice users i.e. users who will simply enter search terms will never know what a full boolean query expression does (with those OR and excluded expressions). On the other hand, expert users who know how to deal with the boolean expressions, the different keywords to do property search and other advanced syntax can anyway write it by hand. In fact, it is much easier for them to write it by hand than to do it visually. In this matter, I like the approach taken by
Google. I think I will push towards a simpler advanced search UI for beagle-search and Kerry, some simple choices like choosing type of file, extension, date range etc. Write the query by hand if you need that extra ounce.

Sunday, February 11, 2007

beagle memory usage

Setup: Fresh run of beagled with only the kmail backend. IndexInfo report about 13700 items i.e. mails and indexed attachments. Beagle version is post 0.2.16, so that includes the individual items in the archive attachments as well. I started beagled as exercise_the_dog, indexing finished within an hour and this is the state after indexing is over.


VIRT      RES   SHR     COMMAND
--------+------+------+--------------------
167m    55m  11m     mozilla-firefox
248m    29m  2856    X
137m    20m  15m     amarokapp
72812   19m  6884    beagled-helper
89560   18m  13m     kmail
35816   15m  11m     konsole
42780   15m  14m     konqueror
49088   12m  5860    beagled
40320   11m  9524    kdesktop
42004   11m  9.9m    basket
32620   10m  9588    kmix
43896   9252 6220    kicker
37904   5884 3908    kded
34560   5600 2680    net_applet

Remember the rule: an approximate idea of the memory usage is given by RES-SHR.

Thursday, February 08, 2007

beagle:Eat less, talk less be smart

Yesterday, beagle 0.2.16 was released. A couple of weeks back, we released 0.2.15 but I did not write about it. 0.2.15 came with a lot of performance and memory improvements, new backends, new features, lots of important changes . In the process, it also broke a few things. Those were fixed and 0.2.16 is a purely bugfix release for 0.2.15. I am considering 0.2.16 the best ever beagle release. Incidentally, 0.2.13+ releases somehow or the other had some nasty problems.

Combining 0.2.15 and 0.2.16, these are the major improvements:

* Very important, the looping bug is fixed. I would even like to claim, fixed forever. I happened to find some important clue while scanning the logs and other information provided by some of our very friendly and helpful users. Eventually our 3 year old database schema was found to be incorrect. Joe finally cleared the mess. Thanks Brian and Rick! This also means an end to the "log file filling hard disk" or "beagle indexing even after a week" type problems.

* Beagle uses some external tools to filter files e.g. pdfinfo, pdftotext, mplayer yada yada. These programs are well written and almost always work. Except some very malformed or wrongly detected mimetype file is sent to them and they go berseck taking up insane amout of memory or CPU time. Since the early release, we used to maintain that there is no way we can control the external processes. After all, we just use 'em. Joe finally put an end to that excuse by using some smart rlimit tricks to limit the resources used by these external processes. We still cannot control how mplayer might behave if given an word doc file, but if it behaves badly it will be killed before too long.

* Indexing data is a strenuous job. Think about all those heavy applications which process or generate these files. But people want indexing to be as silent as possible. There are frequently recommendations that beagle should use high nice, low system priority. low IO priority etc means to be as unobtrusive as possible. The fact is, beagle already does that. However, now we even go one more step by using SCHED_BATCH scheduler policy.

There are other side improvements too, RTF filter is new. The current one is based on the legendary RTF parser by Paul Dubois. Image filters are almost new; we now have Konversation (KDE IRC client) and KOrganizer (KDE tasks and eve nts scheduler) backends. By the way, soon after 0.2.16 was released, Opera webhistory backend was added to trunk. You can just drop the binary from here into your 0.2.16 /usr/lib/beagle/Backends folder and start using it, err... trying it. I do not know how complete it is.

I would like to end by thanking the excellent user base that beagle has developed. Without them, it would not be possible to fix a whole lot of these problems. Beagle would not be what it is today without them.

d-Tech-t