Thursday, September 13, 2007

Tracker-ing looks fun

Jamie checked in thunderbird support to Tracker today. How do I know ? Because I subscribe to svn-commits-list and my gmail filter separated him out when he was adding the beagle thunderbird backend (probably with some modifications, haven not checked the changeset) to tracker.

This is looking pretty cool. I also found out Tracker-0.6 was released with some extremely useful features and a damn good UI. The UI groups the documents in an intuitive way, shows more context information from matches, allows you to do tagging inplace (and has gradients, uh-on). The core daemon (is that what it is called ?) can scan evolution mails, and pidgin logs. Now it can also do index thunderbird mails. It lists XMP metadata support too - which is nice and something I never managed to finish in beagle.

I am pretty sure people will find it useful. I could not resist myself to admire it and share it with others. Well, I was told that tracker is set to become the default desktop search engine in ubuntu, so a lot of unhappy souls are about to become happy soon.

For me, I will continue using beagle. Why ? Because, it does what I need it to do. And if it does not do something, just remember it is written in C# :). Meanwhile, you make your own decision.

Sunday, September 09, 2007

How is WebBeagle for a name ?

(Or FireBeagle ?)

I always wanted to search beagle using a web-browser. My desktop searches are mostly infrequent but complicated. So if I am unable to find something I definitely need an index-based search tool (e.g. beagle) but on the other hand I don't want to keep an application open on my desktop for a long time (beagle does not consume significant memory for me, so it can keep running in the background). That was one reason why I wrote the kio slave for beagle (I have stopped liking it for a long time) and followed it with a kde deskbar like applet (which is nice but with limited functionality). Now that I am done with shameless advertisements, let me share with you yet another way to query beagle.

Today I wrote some ajaxy,xslt-ed webpages to allow users (read: me) to query beagle using a browser (err... firefox, konqueror-3.5.5 does not have XSLT processor). It uses the networkservice backend that can be used to query beagle over network (based on last year's Google Summer of Code projects).It uses internal knowledge of how queries are serialized, lists all the information in a boring way and does not show snippets (yet). But it works and was reasonably fast in displaying 42 results. Besides the boring UI, allowing browsers to access services always opens up some security hole, so it is disabled by default. If you so desire, use it but at your own risk (check the commit log for how to turn this feature on).

I do hope to get this feature properly implemented. The things need to be done are:
  1. Fix the network-backend (it is sufferring from some crashes).
  2. Probably related to the network backend as well, some kind of search authorization is needed.
  3. Use CSS + javascript for the results page to group/sort the results, and make them look decent.
  4. Get snippets and display them. I am thinking of retrieving snippets only on demand. Somehow the name and the location of files or the sender and the subject of emails help me more in filtering out search results than snippets. Which is quite unlike how I use web search engines.
  5. Figure out a way to use the C# or libbeagle API to create the xml request messages. Currently they are hardwired. If the solution turns out to be too complicated, it might not be a bad thing to leave the format hardcoded as it is now.
  6. More cosmetic, separate command line and configuration options for this feature.
  7. This is more related to the network query implementation; figure out how to use the QueryDomain thing meaningfully. The results from some backends only make sense on the same machine and even for some of them, it is tricky to open the applications just from the URI itself. So does it make sense to show e.g. evolution mail hits in the browser ? Does it make sense to return gaim (pidgin) hits when queried over network ?
  8. How does the browser behave (read: choke) when it receives 1000 results (100 results each from 10 backends)? (i.e. the DOMParser has to parse a huge string and form a huge DOM of 1000 Hit nodes. You see why I dont like to get all the snippets beforehand :)?