Thursday, May 17, 2007

Silence of the Dog

Lately beagle releases have slowed down quite a bit; there were a few bug-fix 0.2.16.x release and another 0.2.17 bug-fix release (it was supposed to 0.2.16.4, but the changelog was too large for a point release). The underlying goal is to get ready for 0.3.0; svn trunk is changing so rapidly these days that it is difficult to isolate the simple ones and make them into a 0.2.x release. On the other hand, the changes are too major to be put into a 0.2.x release (they would also need extensive real life testing).

Recently I moved beagle to use taglib-sharp for filtering music files from entagged-sharp. I was told entagged is no more actively maintained and taglib is definitely seeing a lot of rapid development. My timing was not quite right, the 4th March news "Entagged is unmaintained" is followed by the 28th March news "Entagged is maintained". I came to know about it only after I made the transition. Too late! On the plus side, taglib# has support for larger number of formats and is being used by Muine and Banshee, so expect sharing of taglib-sharp libraries. Unfortunately, there are no taglib-sharp packages out there yet (there is a proposal for a debian package), so all the mono apps are currently including taglib-sharp by its source. We too initially source included it, then removed it and instead linked against the package. But if there are no packages for the major distributions, it might make sense to source include it; compiling Beagle is pretty demanding anyway.

In other news, I used the extremely handy heap-shot to identify that instances of IndexReader were not being GC-ed even long after the corresponding method ended. Explicitly setting them to null immediately freed them. I suspect some thread local storage magic happening behind my back. Note to self, set IndexReaders to null immediately after they are closed. Did I say heap-shot is amazing ?!

There are several more improvements to the speed and memory performance of IndexHelper and BuildIndex. One notable feature I added was to reduce re-indexing of files which could not be filtered before. Due to the inherent distributed nature of beagle indexing, the crawler is always separated from the indexer. So if the crawler finds some file which was not filtered before, it has to re-submit it to the indexer. Who knows! There might be a suitable filter now. The downside was that a lot of files were being repeatedly re-tried by the indexer, slowing down the whole process. I decided to store the files containing the filters and their last modified times in a filterver.dat (akin to mozilla pluginreg.dat) and if the filters were not changed since last run, assume that there is no newer filter. Fair guess I would say.

Beagle knew how to index email attachments for quite some time; some months ago it also got the ability to index archives. However all along this was done by extracting the included files to a temporary file and then indexing it. This was done primarily because of the way included content (aka child indexables) were handled and also due to the fact that some of the filters only worked on physical files and not streams. This whole temporary file business never pleased me, there were race conditions which could leave undeleted temporary files in the system, even small included files had to be written to disk and further, extracting the contents of an archive to index it defeated the whole purpose of archiving it. Last week, I added the infrastructure to allow indexing of archives and email attachments without extracting them, if the filter permits of course. The infrastrusture is there, the archive and email filters should be modified to take advantage of this.

Finally, one feature I personally would like to see in 0.3 is support for XMP sidecars. XMP sidecars allow users to add a separate file.ext.xmp file containing arbitrary metadata (but in the XMP format) about file.ext. Really extensible solution for metadata. The main part of the code is in svn trunk; it still does not support renaming or deleting xmp files. Hopefully it will be finished in time.

This will probably be my last post before my annual break to the land of mangoes (" fruit of the gods"). Sadly, I have (knowingly) only tasted about a dozen varieties of mangoes, out of over 300. I will definitely try to increment the number this time. Next post, July.

Thursday, May 03, 2007

Upgrade to Spring

Yesterday in a fit of mind I decided to upgrade to Mandriva latest release.
The steps included
- backing up .kde, .kderc, .qt, .gtk* and .local
- logging out of kde
- setting up a mirror as a distribution source for urpmi (mandriva is offering
non-free e.g. sun-java in its free source these days, but I still need the
PLF sources for codecs, BCI enabled freetype and fontconfig and a few other
things)
- # urpmi urpmi
- # urpmi <bunch of> kernels
- # urpmi --auto-select
and then selecting the ones I would like to upgrade from them

Soon I was running Mandriva 2007.1 Spring (Free). Yay! Its beautiful. The
Ia_ora theme and other Mandriva artwork is gorgeous. I legally own a XP cd,
from which I extensively use Verdana (for text) and Tahoma (for widgets).
They look wonderful as always with plf freetype (w/ hinting). I like to use
large fonts, enough to be readable 4 ft afar but somehow the deja or
bitstream fonts have a weird fuzziness in the curves of 's' and 'o's. I
cleared the settings of a test user account and a default new account looks
quite good (apart from the kbfx-styled mandriva menu).

KDE was upgraded to 3.5.6; I was actually using a few kde-3.5.6 packages from
cooker so there was no huge surprise. I was worried that mandriva would mess
up some of my settings when I log in for the first time as my normal user but
thankfully that did not happen. The system feels faster, konsole definitely
starts faster than before. Overall, I am extremely pleased with 2007.1; I
wish them all the best.
http://wiki.mandriva.com/en/Releases/Mandriva/2007.1/Tour

Some minor annoyances:
- mandriva kernel (based on 2.6.17) still has the weird cpufreq bug where
scaling_max_freq is same as scaling_min_freq (thus rendering all the
governors useless). It is probably the same problem described in

http://www.mail-archive.com/linux-acpi@vger.kernel.org/msg04484.html
- Suspend to RAM is broken in mm kernel (Mandriva has moved to pm-utils and it
works like a charm w/ the default kernel)
- tmb kernel has some problems of high CPU usage when copying files and broken
resume from s2ram. I still need to test the other tmb versions. I really like
the tmb kernel improvements.