On Tagging

I’ve been talking with a few people recently about the phenomenon of tagging (not the grafitti kind), or “folksonomies,” if you will. The common opinion seems to be that of skepticism. That organization by user-created “tags” or keywords is inherently flawed because multiple users will tag something differently. What happens when one person tags a photo of a VW bug with “automobile” and another tags it with “car”? To me that seems like the beauty of the system, the diversity of classification. Chances are you’ll get another person, or better yet, multiple people, who end up tagging it with both “automobile” and “car”. The system now has enough information to know that in some ways “automobile” and “car” are related and should maybe be placed in the same group.

Flickr has figured this out with their clusters. Here’s a good example of Flickr’s clusters doing their thing, with the word “beetle.” We get one cluster for Volkswagens and one for insects. But there’s also a third cluster thrown in there with the word “macro” similar to the insects cluster. Is it perfect? No, but it’s not bad.

How does del.icio.us handle things? When you add a new bookmark or link to your del.icio.us account, they provide a brief list of suggested tags based on tags that other users’ have entered. This helps with some of the ambiguity, but again it isn’t foolproof. When you’re adding a new site that has never been bookmarked before in del.icio.us, well, you’re on your own. Or head to Tagyu to get some recommendations.

Despite the trouble of relying on imperfect people to classify information on their own, I still think that tagging is where we’re headed in the next big search revolution. In fact, if you think about what Google did with their search and Pagerank, it’s basically a “folksonomy.” Website A links to Website B with the word “automobiles.” Website B now scores a higher pagerank for the word. Who would ever have thought that would make for a good way to search the Internet? For better, or worse, but mostly better, Google has treated hyperlinks as a kind of tag, or descriptor. The flaws in this system are obvious. I mean, look, I just linked to a Wikipedia entry about Google-bombing, using the word “worse.” Does that entry have anything to do with “worse”? It’s a stretch, but not really.

Whichever big internet player grabs an established system like del.icio.us next (I’m rooting for Yahoo!), will have a whole heck of a lot of classified, and categorized websites, ripe for integration with the next generation search. We’ll start to see more blogging software with tags built-in, and photos, links and music all tagged interconnectedly. I can’t wait to see the results. Not to mention the dawn of tag-spam. Spag?


  1. affinity != equivalence =(

  2. I don’t think I said it did.

    affinity (sometimes) ≅ equivalence
    affinity (more often) ≅ relation
    and certainly,
    affinity (sometimes) != anything

    Nothing’s perfect.

  3. Well, you said: ‘The system now has enough information to know that in some ways “automobile? and “car? are related and should maybe be placed in the same group.’

    What the system has is the knowledge that there is an affinity between automobile and car. That could mean equivalence, it could mean relation, or it could mean nothing. So concluding that “[they] are related and maybe they should be placed in the same group” isn’t really correct, is it?

    I spent a little time on flickr looking for commonalities between words like “leaf” and “leaves” and “tree” and “fall” and “autumn” on flickr and it looks to me that flickr doesn’t have the capability to determine equivalence or relation based on overlapping keywords or dictionaries. What folksonomy happening there is being twisted by the “interestingness” flickr rating and editorial intervention; it’s (at least partly) a controlled vocabulary. The fact that it’s at least partly self-controlled is kinda interesting but will no doubt lead to bizarre results (like you said, googlebombing is not possible without folksonomies, and people are perverse).

  4. lopolis,

    Semantics. Instead of “are related” I should have used “have an affinity.”

    Yeah, Flickr’s system seems to add weighting with other variables such as their “interestingness.” I still wonder how much of it is automated as opposed to editorial intervention. If “interstingness” is calculated by most-viewed images and most comments, and weighted accordingly in a cluster, the results would be a hierarchy of the “best” results. Prime (as in juicy, not unfactorable) algorithms and logic to apply to the next search engine.

