Monday, April 28, 2008

I knew searching would fail

There is a great report about a usability test on the web about a guy
giving some common computer tasks to his girlfriend on a fresh Hardy
Heron installation. I found it via Slashdot but since fewer and fewer
people are slashdotting these days, so here is a link [1].

The tasks are well chosen and the user is _not_ a first time
grandma-type-user and approaches each task in the obvious possible
way. "Obvious possible" - for people not used to a Linux distribution,
where "doing stuff my way" rules over "getting stuff done".

As I started reading it, I knew at some point the user was going to
"search for something". And I knew she was going to fail. Which she

The problem with most (all?) of the linux desktop search applications
is that they are cut out for a particular task and are (hopefully)
pretty good at it. Indexing is the keyword there - how to index all
kinds of data in the best possible way and then allow users to search
the indexed data. And there are lots of sophistication there.
Unfortunately the common search tasks by an user is not quite that.

- Search for a file by name - most common
- Search for files of certain types
- Search for files in home directory containing some text - slightly
advanced usage
- Search among browsed websites etc.
- As a computer "user", it is not clear why I would search for
websites in the desktop search tool and not in the browser. Of course,
once I am told this can be done in the desktop search tool _too_, I
would be extremely glad and nod in appreciation.

It takes time to write a desktop indexing and searching system. I
didnt believe it when I first heard of it and my friends asked me what
is so difficult about it other than implementing inotify. For some
reason it is. So a lot of effort in invested behind that. But there
has been less effort in presenting a failsafe, minimum capability
search experience in that direction. What do I mean by failsafe and
minimum capability ?

- One obvious way to launch the search tool (there could be more, but
there should be one which may not be the best but works in the worst
- The obvious tool should never fail on the basic searching - never,
never, never. By basic searching I mean searching for non-file content
information - name, size, type (what on earth is _mimetype_ to a
non-CS/IT person ? searching by extension is what I mean. broad
classification like music, picture helps).
- Repeat the above. Let me say it this way - if the user knows a file
exists, she should be able to find it by name. And matches by name
should come _first_. Same for search by types.
- Anything else is a bonus. When we have complete semantic desktops,
where a file is same as an email and same as an addressbook entry,
maybe then users would want to search for everything or specify what
exactly she wants. Not now.

So where does beagle fall behind (or some of the others tools, by
reading about them and looking at their screenshots).
- User want to use them to search for files. The tools return pretty
much everything.
- Give them an option on where to search. There is no need to
include an option for "application/rdf+xul" but list the common
options. A search service has to work for the minimum, a GUI has to
cater to the average. I would be sad if it didnt have ways to cater to
the advanced crowd too but I dont mind if that requires one extra
- User wants to search everywhere (in the filesystem).
- Thats definitely not what beagle was designed for. A beagle search
tool is not expected to do that. But when it is presented as the
_main_ search tool to the users, it will be used to search everywhere.
And it will fail.
- I dont quite know how to design a failsafe GUI search tool but a
good start would be use the indexing service to home directory files
and brute-force 'find /usr/bin' for non-home directory partitions.
- Some users would never need to search by content.
- If searching content was cheap then it would not be a big deal if
searching by content is enabled. But content searching is expensive
from my experience. It would be better if users are allowed to opt-in
for content searching.
- Content searching is not supposed to be expensive. As far as
beagle is concerned, it is halfway on meeting this goal. It still
needs some fault-tolerance feature to detect problems before too much
damage has been done.

There are lots of other ways to make searching "just work". The user
does not even need to know there is any indexing in the background.
The sad part is a lot of what I suggested (or could have suggested) is
already possible with the current beagle infrastructure. What is
lacking is someone with a good GUI knowledge to work on improving the
search experience. I am defending the base by fixing the occassional
simple bugs but a real developer is needed. And needed urgently
otherwise yet another distributions will be released with a lacklustre
search experience.

Sunday, April 20, 2008

Takes Two to Release

The GMail backend I blogged about before is now available for mass abuse in Beagle 0.3.6(.1). We also tried to maintain our love of cutting edge technology by upgrading the Firefox extension to work with Firefox 3.0.

I noticed several forum posts where users wanted to use beagle like locate/find-grep. The desire was two pronged - no intention to run a daemon continuously and return files from everywhere doing basic searches in name and path. That is not how beagle is supposed to be used but users are the boss in a community project. So I added blocate, a wrapper to beagle-static-query. Currently it only matches the -d DBpath parameter of locate but works like a charm. Sample uses
  $ blocate sondesh
  $ blocate -d manpages locate

The other thing I added was a locate backend. I absolutely do not recommend using this one. Yet if you insist ... when enabled and used with the FileSystem backend, it will return search results from the locate program. Yes, results from eVeRyWhErE, as you wished.

You can use both the GMail and the locate backends in beagle-search as well. But both the new backends are rather primitive, so I have taken enough precautions againsts n00bs accidentally using them. So in summary, 0.3.6 is not going to do you any good. Oops... did I just say that ?!

The title is based on the empiricial count of the number of actual releases (including brown bag ones) needed for last few releases.

Tuesday, April 08, 2008

You said Google owns your life

And I nodded my head and felt sympathy for you. I also used the I_doubt_its_maintained_anymore and just_barely_works xemail-net to write a live GMail backend for beagle. It does not index the emails as of now, but uses the IMAP search protocol to directly search the emails on the GMail IMAP server. And searching followed by retrieving the headers of the matched emails is really slow; the delay is clearly perceptible. It could be due to xemail but I could not find a better alternative for a .Net IMAP library.

It should not be hard to take the current backend and add the ability to download a batch of emails and index them locally. Google also publishes a nice set of GData .NET API for accessing Google documents, calendars and a lot of other services. A backend for them would at least make our beloved maintainer happy.

A basic GMail query-only backend was on my TODO list for more than year. And now its finally done. Proper GMail indexing and Google service is also on my TODO list. So hope to see it sometime... say by next year.