Digital Pathology: Inhuman Agency Insufficient: Counterpoint

I'm a pathological Google user, and I'm quick to praise the company and its tools. But even the excellent search engine rewards (and occasionally, requires) finesse to find the desired results. Understanding that [cook bake pie] and [pie bake cook] will bring up similar results, but differently ordered, is important. This is precisely the catch-22 of all abstraction layers: they can make information more accessible, but less accountable for its origins. Because the Google search engine second guesses my intent, I have to second guess it in return when I form my query. This is not a complaint; because the Google engine is constantly adapting its output to the human use of information, rather than the information itself, I suspect that the Google method has more promise in this area than the major proposed near-future companion technology--the Semantic Web.

The Semantic Web, for all of its interesting propositions and potential benefits, has a major weakness in this analysis. By making more information machine-readable, and proposing (necessitating, actually) that machine agents then assume a greater role in information search/retrieval/presentation, it removes a critical aspect of quality control. The visual presentation of a web page provides countless subtle clues that inform our assessment of its quality, reliability, and authenticity. There are a thousand shoddy-looking scam sites out there for every rare convincing one. Likewise, by stripping the immediately desired information from the flow of presentation in a particular source, important context clues are removed. Even the most paranoid alien-conspiracy sites will occasionally have a bit of sane-sounding output, if removed from the source. But the source is important, and many of the most important qualities of a source can't be analyzed with metatags.

This is not to suggest that in some way, the SW won't happen, or that it shouldn't. But it won't be a magical cure for the world's information organization problems. Google, of course, will be among the first to usefully utilize new semantic metadata, which will probably be a big help. And of course the two methods aren't mutually exclusive. But the Google approach says: let us have a look at everything, however arbitrarily arranged, and we'll make it useful by observing how it is used, and basing our output on that. The SW says: follow our recommendations for how your information should be marked, and if everyone complies, then the information will automatically be more useful. But metatagging everything won't, on its own, make information organization easier, and certainly won't ease quality assessment. On the contrary, it will require ever-expanding infrastructure to manage it. The lengthy debate over HTML use and misuse (re: standards) will simply be moved behind the markup. Human, contextual assessment will remain the final judge of information's value, and we should be careful to not (accidentally or intentionally) try to remove it from the picture, in the name of simplification, efficiency, or usability.

Digital Pathology

Monday, September 19, 2005

Inhuman Agency Insufficient: Counterpoint

Previous Posts