A Problem with Tagging

By Deane Barker on June 13, 2005

I wonder how long before the whole tagging phenomenon jumps the shark? I like it and everything, but have a sneaking suspicion that we’re going to come full circle back to taxonomies.

We’ve talked about taxonomies before – these are the big parent-child tree structures that have traditionally defined information architecture. Tagging is a direct response to the complication and “monolithic-ness” of the taxonomy – instead of defining the entire tree, you just label the one leaf that you’re working with.

But what happens when the tree starts creeping back?

For instance, one of the drawbacks with tagging is that people have different names for the different things. What I call “automotive,” you might call “cars,” so our entries don’t appear under the same tag. Have this happen enough times, and it gets annoying.

How do we get around this? Well, let’s create a thesaurus then. Let’s tell the system that “cars” and “automotive” are more or less the same thing, so if someone searches for anything tagged “automotive,” return anything tagged as “cars” as well. Awesome – now we’re back in action, even though we have a bit of a top-down system to maintain. It’s a small price to pay.

But what happens when someone wants to broaden their search beyond just a simple tag? Instead of just automotive-related items, I want to find anything to do with vehicles.

How do I back up from “automotive” to “vehicles”? Well, we need to tell the system that “cars” is a *child *of “vehicles.” For that matter, there are more than just that in “vehicles.” “Vehicles” is really a *parent *of “planes,” “trains,” and “boats” too. No problem, we just need to create a recursive table that tracks how tags are related to each other, like...a taxonomy of tags...

And, with that, we’ve come full circle back to the top-down taxonomy. Wow, that was quick.

I don’t think this is so bad, because it still has some advantages. The editing interface for tags (a simple texbox) is much better than the mess we get with parent-child stuff (usually a huge list of checkboxes).

Additionally, tag-based organization can kind of define itself. Instead of sitting around thinking up a huge taxonomy before you get started, you can watch the tags that come rolling through the system and just organize them as they come in. (“Oh look, another tag for ‘hydrofoils.' Maybe we can stick that under ‘boats’..”)

Finally, if you’re really anal retentive, you can “normalize” the tags as they get applied. When an item gets submitted with “cars” and “howto” assigned, you can detect and change them to “automotive” and “tutorial” if you like. Be sure to notify the user, however, so they know where to find the thing when they go looking for it (or just make sure the the thesaurus has the correct relationships defined).

Anyone have thoughts on this? Am I just trying to rain on the tagging parade?

Comments (4)

Johnny says:

You’re not trying to rain on the parade at all; you’re saying something that needs to be said. I can’t believe you’re the only one saying it. Your examples are even more extreme than they need to be. For instance, if I’m tagging bookmarks on del.icio.us, do I use blogs or blog? This type of thing could be pluralized with some style guidance – e.g., “try not to pluralize tags unless the tag doesn’t make sense otherwise.”

Some sites, notably Technorati, try to remedy the lack of a hierarchy with “related tags” – i.e., searching for the other tags people used in conjunction with a given tag. The results are often comical, occasionally useful. The relations are still free-assocational, and they’re nothing approaching the neat conceptual hierarchy you envision. For instance, if you’re looking at the tag “cars” in Technorati, the related tags are half synonyms for “cars” and half tangentially-related topics. This addresses the problem of cars vs. auto that you discussed, but not the parent-child issue. I highly doubt that “vehicles,” “transportation,” etc., will ever show up in the related tags.

The answer, in my mind, is that if you’re dealing with serious data, a “folksonomy” is a poor solution. You’re basing your ability to retrieve information on the vagaries of free association. That said, when your interests are more general, tags have proven to be an excellent solution.

So why does it have to be an either/or proposition? Why not use tags in tandem with a more rigidly defined taxonomy? In the case of blog entries, for instance, you could have the authoritative categories but also tag each post with more whimsical keywords that strike you at the moment. You could keep the organizational utility of categories while allowing your blog to be searchable by tags, as well.

Along these lines, I think the new Movable Type tags plug-in misses the boat. This plug-in allows you to enter tags in the standard textbox, but when you’re done, it creates categories to match all the tags you typed in. So it’s essentially just applying a tag interface to the pre-existing MT category structure. This is of limited appeal to me – tags are a different concept than categories, so they should employ their own infrastructure. With this plug-in, I’m going to end up with a mess of dozens of categories – we’ve all seen that type of nonsense, with tag-type “categories” dripping down a blog’s sidebar, most of them housing only one or two posts – and lose any ability to reasonably organize things again.

Tags are tags, and they have their purpose, but they are not the be-all end-all. Let’s not turn them into categories; it’s a self-defeating exercise for the reasons you laid out. Thanks for chiming in on the issue. It’s an important one.

ardief says:

I think you raise very interesting points, and am slightly disappointed that the Technorati related tags doesn’t work too well. As someone who works in Natural Language Processing, I can say that the general issue of tags and word relatedness and hierarchies is a problem that is tackled a lot in the community, but I suspect that many researchers haven’t ‘discovered’ folksonomies yet and therefore haven’t directed their efforts specifically in that direction...yet. Hopefully as awareness of this grows we can see research more tailored towards this.

MEL says:

I’ve thought about this some in trying to organize tags on my own site. Basically I think there needs to be both a top-down hierarchy and a bottom-up “folksonomy”. But in order for the latter to work you need to have people posting synonyms – both “car” and “auto”. So when you see the list of “related tags” for “car” you can think, “hey, there are probably other car-related posts over at ‘auto,' but probably not so many under the related tag ‘fish’.”

The other thing we need is a better way to navigate the “tag cloud.” I’d love to do a TouchGraph kind of thing on my own site, but don’t know how to program it. But basically we should have a visual way to explore how tags relate to each other, either within one site or across the ‘sphere.

karl says:

He’s far to be the only one to say it. :) There are a lot of problems with tags from I18N to organization of information to search. :) But I guess saying to people, you will burn your hand does nothing until they really burn their hand (full circle).

I have written about exactly the same thing on My weblog, in French.

So you are not alone ;)

Not let’s use the tags to make a flexible taxonomy system ;)