Sunday, September 09, 2007

How is WebBeagle for a name ?

(Or FireBeagle ?)

I always wanted to search beagle using a web-browser. My desktop searches are mostly infrequent but complicated. So if I am unable to find something I definitely need an index-based search tool (e.g. beagle) but on the other hand I don't want to keep an application open on my desktop for a long time (beagle does not consume significant memory for me, so it can keep running in the background). That was one reason why I wrote the kio slave for beagle (I have stopped liking it for a long time) and followed it with a kde deskbar like applet (which is nice but with limited functionality). Now that I am done with shameless advertisements, let me share with you yet another way to query beagle.

Today I wrote some ajaxy,xslt-ed webpages to allow users (read: me) to query beagle using a browser (err... firefox, konqueror-3.5.5 does not have XSLT processor). It uses the networkservice backend that can be used to query beagle over network (based on last year's Google Summer of Code projects).It uses internal knowledge of how queries are serialized, lists all the information in a boring way and does not show snippets (yet). But it works and was reasonably fast in displaying 42 results. Besides the boring UI, allowing browsers to access services always opens up some security hole, so it is disabled by default. If you so desire, use it but at your own risk (check the commit log for how to turn this feature on).

I do hope to get this feature properly implemented. The things need to be done are:
  1. Fix the network-backend (it is sufferring from some crashes).
  2. Probably related to the network backend as well, some kind of search authorization is needed.
  3. Use CSS + javascript for the results page to group/sort the results, and make them look decent.
  4. Get snippets and display them. I am thinking of retrieving snippets only on demand. Somehow the name and the location of files or the sender and the subject of emails help me more in filtering out search results than snippets. Which is quite unlike how I use web search engines.
  5. Figure out a way to use the C# or libbeagle API to create the xml request messages. Currently they are hardwired. If the solution turns out to be too complicated, it might not be a bad thing to leave the format hardcoded as it is now.
  6. More cosmetic, separate command line and configuration options for this feature.
  7. This is more related to the network query implementation; figure out how to use the QueryDomain thing meaningfully. The results from some backends only make sense on the same machine and even for some of them, it is tricky to open the applications just from the URI itself. So does it make sense to show e.g. evolution mail hits in the browser ? Does it make sense to return gaim (pidgin) hits when queried over network ?
  8. How does the browser behave (read: choke) when it receives 1000 results (100 results each from 10 backends)? (i.e. the DOMParser has to parse a huge string and form a huge DOM of 1000 Hit nodes. You see why I dont like to get all the snippets beforehand :)?

9 comments:

Filia Tao said...

Interesting .
I like this idea.
WebUI is always easier than GUI .

dBera said...

Tao, wanna help :) ?
I first thought of writing an extension, then a protocol handler, then decided to just get all data using xmlhttprequest and do the UI on the client side (web-2.0 style).

Filia Tao said...

>>wanna help :) ?
Yes. If I could.
I wrote some similiar thing in one course project.
Using a embed web server and write some simple html page. (no javascript at all).
But beagle's networkservice backend seems using xml .
So it's client's responsilbilty to translate it to html, deal with links , form submit and so on .
RIA ? :)

dBera said...

Yes. Its an RIA. The data is transferred using XML. In the current implementation its all in a simple javascript function and an xsl file to make an html out of it. There is a form which is used to query - so that part is done. The data transfer + receiving part is not a problem.

The javascript code in the html page will be given an XML dom with all the information. I need help after that, basically the superpowerful javascript+css (with an initial XSLT) has to display the results and provide client side sorting, grouping (e.g. show all Files or show all hits from Kopete).

Filia Tao said...

I suggest using jQuery.
As there must a lot of dom operations.
jQuery is really good at it.

Anonymous said...

hey dbera! great work on the kmail backend. i had one tiny nit/suggestion. the kmail backend already has all the machinery to index maildir folders. i run an imap server locally, which dumps emails in maildir format under ~/Maildir -- which, unfortunately, is NOT indexed by beagle. why?

dBera said...

Diwaker,
Check out http://mail.gnome.org/archives/dashboard-hackers/2007-September/msg00012.html

Anonymous said...

Its good - if you tell me How to change the local links von file:///mnt/test/file.txt

to: file://///server/share/test/file.txt

it would be great. So I would be able to find Files on my Samba server, using a Remote Windows Client

Anonymous said...

Hello, I'm from Italy, I'm "googling" for a server-side solution to indexing some document on a Ubuntu 8.04 Server without the GUI (gnome). Can Beagle Web Interface help me?