theBand
theBlueSmokeBand home

©2002 — 2024

A Few Words
      

Category:
Technology

What Is The Use Of A Desktop Search Tool?

One word leaps to mind: grep. As defined by the Wikipedia: "The name comes from a command in the Unix text editor ed that takes the form g/re/p meaning 'search globally for matches to the regular expression re, and print lines where they are found'". The command can be used to search any mounted drive on a UNIX machine. You can choose to start from wherever and you can choose to end wherever, exclude whatever, include whatever. With this simple command, you can perform almost any kind of text search imaginable.

This all seems so obvious. It made me think that I was missing something really important about the possibilities of "desktop searching". The desktop search tool catalogs the contents of your computer in a way analogous to how search engines catalog the content of the Internet. The tool creates a repository of words and documents, much like the databases used by Google. With this little respository in place, you can then use your web browser (or some stand-alone application) to perform search-engine-esque searches on everything that has been catalogued. In some incarnations, the results show up at the top of search engine results.

Clever idea, sort of. The success or the failure of this idea hinges on 1) users' inability or unwillingness to organize their own data, 2) users keeping their data on a local "desktop" and 3) a homogenized and agreeable notion of what constitutes a "desktop". On first blush, my suspicion is that the desktop search idea will wither as the demographic who would likely be interested in desktop searching is already skilled at data organization. Furthermore, the savvy user probably does not store his or her most important documents on a local hard drive — for security and backup reasons. With the variety of desktop environments available, as well as the wider variety of filesystems available, it is hard to imagine that we could derive a generally agreeable and predictably static conception of "the desktop". Given these simple observations, I suspect that there is something else afoot in the push for desktop searching technologies.

Perhaps my first impressions stand in need of deeper analysis. Maybe it is the case that so-called power users are in control of such a vast array of data that they actually cannot figure out how to organize it in a manageable fashion. Again, I only imagine that higher end users are those in possession of enough data to warrant a local search engine to find it — this is above and beyond the already-available tools, e.g., in the Microsoft world "search" is available when you right click on a driver letter in "My Computer".

Data Structure

How about a quick and dirty structure for the sort of data that I might store on my hard drive:

/home
	./documents
		./writings
			./philosophy
			./fiction
		./documentation
		./papers
		./articles
			./published
			./pending
			./drafts
			./complete
	./multimedia
		./audio
			./mp3
			./ogg
			./wav
			./artists
				./charlie.hunter
				./norah.jones
				./flogging.molly
		./graphic
			./gimp
				./logos
			./ImageMagick
		./video
			./vacation
			./promotions
		./photo
			./vacation
			./experimental
	./programs
		./java
		./php
		./jsp
	./install
		./bbgallery
		./lame
		./firefox
		./fluxbox
	./sites
		./theBlueSmokeBand
			./home
			./images
			./mp3
		./mortgageminds
			./home
			./images
			./sql
		./phil7
			./home
				./jsp
				./java
			./images
	./business
		./designs
			./cad
			./procedures
		./accounting
			./2003
			./2004
				./tax
				./invoices
		./contracts
			./pending
			./completed

Here I have six main categories of data: documents, multimedia, programs, install, sites and business. These correspond to the broadest activities in which I engage while using the computer. When using the computer for multimedia purposes, for example, I might be using audio, graphic, video or photo applications. And so on. This structure, I think, is really simple. If I need to find a song by Charlie Hunter, no problem: I know that it's multimedia, and that it's audio, and that he's an artist and, well, there's Charlie Hunter. My suspicion is that a local search engine is not going to tell me anything more than what I already know.

Wrong? The local search engine, when I ask it to find Charlie Hunter, will also find this document, which will be somewhere in the "sites" structure. It should also return a reference to some papers I wrote for grad school in which I used Charlie Hunter as an example (long story that is). Presumably, the desktop search tool will use an algorithm to rank the results so that I would see .... Well, what would I see first? Would it be the audio results or the philosophy results. Hard to say, isn't it. And moreover, not particularly helpful.

My contention, here, is that the desktop search tool will be subject to the same relevancy concerns as every other search engine on the Internet. The fact of a well-organized hard drive already addresses what search engines cannot — relevancy. The Internet is not a well-structured or well organized repository, hence the need for a search engine. My hard drive: very well organized.

Fine, so not everyone is as retentive as me when it comes to organizing data on a hard drive. Perhaps a more normal structure is:

/home
	/stuff
	/download
	/more.stuff
	/old.stuff

If this is the case, then bully for the desktop search tool. But recall the suspicion that the desktop search tool is tailored for higher-end users. Users with a directory structure like that above are likely not concerned with extravagant data-mining. Plus, it's hard to imagine less-than-extravagant users needing much more than the already-existing local search utilities.

Data Location

All of the above is rendered moot in the face of users wise enough to store their data on remote servers that are backed up regularly. The local hard drive: not a safe place for data. Give me a remote RAID array with a good nightly backup (tar?) and I'm warm and fuzzy about not losing data. To search the remote drive: why not use grep?

To push this a bit further, I'm consistently mystified by those who lose data due to hardware crashes and viruses and whatnot. I gently suggest to any users who store crucial data on a local PC hard drive, do regular backups! And if possible, store data remotely on a secure server. Again, I suggest that the power-est of the so-called power users in the world already do this. What good does a hard drive searching tool do them? None.

For a tangible example, let's say that I need to find programs that I've written that use Java's TreeMap data structure:

     grep -R 'TreeMap' ./programs/java | less

There we go: there they all are. There's my handy dandy desktop search tool in a few keystrokes. Heck, I'd have to type "TreeMap programs" into a hard drive search tool anyway — what's a few more keystrokes at a command line versus in a desktop toolbar?

To their credit, Google does make an effort to tell us why we need this new tool:

2. Why is this useful?
Since you can easily search information on your computer, you don't need to worry about organizing your files, email, or bookmarks. You can just do a quick search for what you remember seeing, instead of having to remember exactly what file, email, or web page had that information, and where that item is now located on your computer.

This addresses both points about structure and location: structure will become irrelevant and location will be pushed to local. The former point suggests that "organizing your files, email, or bookmarks" has got us worried. The latter point serves to entrench a behaviorial structure, if you will — that data goes on "the desktop".

The fact that Google is so quick to point out what they consider to be the use of the tool suggests that they are making an effort to create a perceived need. In an entirely unscientific way, I've been asking around about what people I know think of desktop search tools. My fiance responded "what's the point of that?" That was my first reaction. And that is a good general summary of the reactions I've gotten from all of my closest geek friends. I have not heard a reaction akin to: "hey, that would be helpful because I'm always worried about organizing my files." Yet, this is the first point that Google makes. Really, they're trying to get you to think about organizing your files and to think of all those times that you couldn't remember the exact name of a file — and presumably couldn't use the search functions built into your operating system. If they (and others) successfully massage our perceptions, their tool will become indispensible.

What exactly is a desktop?

The notion of a digital desktop is, in my eyes, contentious. The term gets bandied about fairly casually as if every computer user agrees on what "the desktop" is.

If it is the case that the designers of these search tools intend to talk about the interface through which users interact with the operating system, then the ultimate desktop search tool is, quite simply, one's eyes. This is particularly true in the Windows world where there is only one view, if you will, as opposed to the UNIX world where multiple desktops are the norm. (Yes, I have heard that you can obtain "power tools" for Windows that allow for multiple desktops. They are weak.)

It is fair to assume that the designers do not have such a superficial conception of the digital desktop in mind. Or perhaps it is disingenuous of me to lean so hard upon the popular jargon-choice. Yet I think that it is significant that the designers claim to have come up with ways to search our "desktops".

A possibly broader understanding of the notion of a "desktop" involves the location of personal files — files to which no other users have access. If this is the case, then talking about a searching mechanism for those files is perfectly reasonable. However, the push to substitute existing mechanisms with a new tool relies on a systematic failure with the currently available tools, e.g., grep and smart data organization. In other words, the successful marketing of this tool will rely on specifying a problem that the tool addresses. I cannot find that problem. Desktop search tool makers will create that problem.

Furthermore, the introduction of a searching mechanism that will work for all "desktop" configurations relies on entrenching the currently available configurations. I argue that entrenchment, for example the way that Internet Explorer is entrenched in Windows, is dangerous. If the desktop search tool "ties" various programs together, and we start to rely heavily on this tool, then we really end up relying on the ties that bind the components that currently make up the digital desktop. This is precisely what makes Microsoft's products so ubiquitous, so unstable and, frankly, so breakable. I fear for the necessary homogenization that will accompany the introduction of a tool of this nature.

The purpose

Given these things, I truly honestly deeply am puzzled by the variety of heavy-hitting companies who are working furiously to develop mysterious desktop search tools. Then I read a blurb from Gartner Research indicating that Google's search tool should not be used in corporate environments. The reasoning involved the fact that it's not clear what information gets sent to Google when the user performs a search. Google released a response that their desktop search tool is in its beta phase and so is not touted as an enterprise-ready solution. And added that they only collect....

Stop there. It is of no consequence what they say that they collect. The fact remains, they collect.

The clouds have parted and the sun shines through — I think I have found my answer to "What is the use of a desktop search tool?" It is not a data searching tool: it is a data collection tool. User searching benefit, which will get couched in terms of "productivity" soon I wager, is a side effect to the more lucrative purpose of gathering and distributing marketing information to better aim advertising at specific user groups. Or more: to better keep track of who does what on their own computer.

Conspiracy?

At this point, readers might be concerned that I'm digging my heels into a good old fashioned conspiracy theory. Could be. But Google suggests differently. In an effort to allay users' fears about use of personal information, they say:

Your computer's content is not made accessible to Google or anyone else without your explicit permission

Compare and contrast this with what they say further down the same page:

By default, Google Desktop Search collects a limited amount of non-personal information from your computer and sends it to Google. This includes summary information, such as the number of searches you do and the time it takes for you to see your results, and application reports we'll use to make the program better. You can opt out of sending this information during the installation process or from the application preferences at any time.

The "tool" does indeed collect information. All that Google has promised is that content will not be made available. This says nothing of statistics that they might compile, e.g., the location of your computer and what sorts of searches you do. How interesting might this be to, say, Wal-Mart? Or to insurance companies? None of this is "your computer's content", and hence is not covered by the agreement.

To reiterate: they are not sending your personal information to anyone in particular, but marketers and the like are not interested in you; they are interested in trends. As a user of a desktop search, you become a volunteer member of a particular demographic sample and marketers will take a great interest in Google's capacity to extract information from that sample. Rest assured, the Googles and Yahoo!s and MSNs of the search world will be paid handsomely for the information that you collect on their behalf. Consider the cash that marketing firms will save on focus groups: again, Desktop Search Tools of this nature make you a volunteer in what amounts to the world's most enormous focus group.

Is this really what they do? By default, yes. You need to ask them to not do it. This simple fact should help to indicate the underlying motivation for the service. Witness:

If you send us non-personal information about your Google Desktop Search use, we may be able to make Google services work better by associating this information with other Google services you use and vice versa. You can opt out of sending such non-personal information to Google during the installation process or from the application preferences at any time.

Again a reiteration: the default behavior, Google clearly states, is to gather information, in the guise of making things work better. It is valuable to ponder what would count as a Google service "working better". Better targeted advertisements are certainly a start, aren't they? And another thing: what in God's name do they mean by "vice versa"? If the point is that Google can make things work better for you, then vice versa is that you can make things work better for them, isn't it?

Look at how many huge companies are competing for your desktop searching business. Why are they so intensely interested in your worries about file organization?

This is a philosophical introduction to a broader article to be continued....

© 2004 Sorrell
December