To Structure or Not to Structure

By Deane Barker on December 7, 2007

I’ve talked a lot over the years about content modeling. Open and Closed Content Management is probably the most self-referenced post on this site. Recently I called content modeling one of the Four Disciplines of Content Management.

But, lingering behind all the questions about how to model something is a bigger question: do you model it at all? When is it obvious to structure some content, and when do you just throw it into the “WYSIWYG pile”?

We were meeting with a client the other day about applying some content management to their Web site. We came upon a page of “business partners.” It had a repeated HTML structure consisting of a logo for the partner, their name, their URL, and a few paragraphs about them. There were maybe a dozen or so partners listed.

It looked like this:

From a content modeling perspective, you have three ways to handle content like this:

  1. No structure
    This is a perfectly viable option — provided you didn’t mind a TABLE, the HTML represented here is nothing any decent WYSIWYG editor couldn’t handle.
  2. Structured as a single content object
    For most systems, this means an XML document, with a repeating “partner” element, and sub-elements therein for “logo,” “name,” “url,” and “description.”
  3. Structured as multiple content objects
    You could create a “partner” content type, with fields for “logo,” “name,” “url,” and “description.” This page would be a rendering of multiple partner content objects, sequentially down the page.

Not surprisingly, there are advantages and disadvantages for each, and we threw them around with this client.

No Structure

  • Advantages
    • The simplest, fastest, and cheapest thing to do.
    • It’s flexible. To structure content, it needs to be rigid and consistent. However, the client will invariably have an exception to this rule, and a WYSIWYG editor will let them do whatever they want.
    • It’s likely more usable for the end-user. They have content in their WYSIWYG editor that looks like it does on the page, and it’s tough to make more sense than that.
  • Disadvantages
    • You’re trusting the user to not screw up the formatting. That can be a huge leap of faith.
    • You can’t isolate the individual partner records. If they suddenly decided they wanted to feature a random partner on the home page, you can’t easily go pick one out of the list.
    • The images can be a problem if they have to be resized, given the generally poor handling of file-based content by WCMSs these days.
    • WYSIWYG editors don’t do CSS layouts well, so you may have to settle for some horked up HTML. The above layout, for instance, would be hard for the average content editor to do in anything but a big TABLE tag.

Structured as a single content object

  • Advantages
    • You can control the formatting at the template level, which means it’s much harder for the user to screw it up. You’ve taken the formatting out of the user’s hands and they’re now working with pure content.
    • You can programmatically resize and otherwise process the images.
    • The page remains a single content object, which is simpler for the end user (opposed to multiple content objects, in the next scenario).
    • There’s now enough structure to get at the individual records if you need to.
  • Disadvantages
    • With structure comes rigidity. If the client wants an exception to the format for Partner X, it can become a complicated exception process, or it can be nigh impossible.
    • Changes to the format now become a developer concern, rather than an editor concern. If they suddenly decide they want to include a line for “location,” they can’t do that themselves without dual-purposing another field, which kind of defeats the purpose.
    • To effect this solution, you need to have a CMS that can handle repeating form elements in a content type. This usually means XML, and an editor that allows repeating XML elements.
    • While arbitrary ordering of records is simple (most XML editors will let you move child elements around within the parent), sorting based on a property can be problematic. Support for sorting in XSL is simplistic, and to sort otherwise would require you to parse the XML into some other, sortable data structure, sort it, then publish it from there, or put it back into XML to be transformed.

Structured as multiple content objects

  • Advantages
    • As before, you can control the formatting at the template level.
    • You can manage each partner record individually. For each one, this means you can permission them, version them, subject them to workflow, etc. You could even give each partner a login and let them manage their own record.
    • Getting at each of the records individually is as simple as possible.
    • Most all CMSs can handle this type of structure (as opposed to the need for repeating elements in the prior scenario).
  • Disadvantages
    • As before, with structure comes rigidity.
    • Can be more complex for the end user. They have more than one object floating around. Some users might find this just as simple, some may find it complicated.
    • Depending on the CMS, ordering the records on the page can be a problem. You might want to arbitrarily order these records (rather than sorting on a property), and some CMSs do that better than others.

So, there’s a run-down of the advantages and the disadvantages of the major approaches. But which one to choose?

As you’d imagine, there’s no clear-cut answer. Here are some of the factors to consider:

  • To what extent do you need to sub-divide the content? In the above example, will you ever need to isolate a single partner?
  • If your answer to the above is “no,” how sure are you of that? How well does the end-user understand the risk they’re taking by not making the content sub-dividable? Will they accept extra expense if it has to be structured later?
  • How many sub-dividable units are there? The stakes are much lower with a dozen than with 15,000. Additionally, as the number of units goes up, so does the administrative overhead of managing them all as a group (finding the one that changed out of a dozen is easier than out of 15,000, no matter how good your diffing tool).
  • What is the technical sophistication of the end-user? How well do they understand content management? Can they grasp the concept of compositing a page out of sub-elements, or are they going to be confused if what’s in the editor doesn’t look like what’s on the page?
  • How often is the content going to change? When it does change, how often will it be a single element? Will changes to single units overlap, so if one is in the middle of a workflow, would it help to be able to send another one through workflow as a separate unit?
  • How intricate is the formatting?
  • How do they want to order the individual records?
  • Is there any processing you need to do on sub-elements of a record? In the example, the image needs to be resized consistently every time. In other cases, you may need to do consistent calculations of user-supplied values that are best run through a stable algorithm.
  • Are permissions different on individual units? In the example, if partners can manage their own record, then they have to be permissioned separately.
  • How sophisticated is your CMS editing tool? Can it even do repeating elements at all? If you choose the second option, and structure within a single content object, how closely can the editing form look like what’s on the page? (Put another way, how easily can you “trick” the end-user into structuring content by making it look like WYSIWYG?)

    (Incidentally, Ektron does a good job of this. You can make input forms with repeating elements that come very close what what the end result will be. Joseph Scott’s Edit in Place would do well here too.)

In writing this post, I tossed it over the pond to Josh Clark for his input. In his response, he captured one of the more succinct differences between how we (developers) look at content, and how the end users do. This too, needs to play a role in your decision (emphasis mine):

The big advantage to structuring content, of course, is that it lets you repackage it and present it in different forms and contexts. The downside is that it forces editors to approach their content like machines, thinking in terms of abstract fields that might be mixed and matched down the road. The benefits often outweigh this usability cost if you’re going to present the content elements in multiple contexts and/or offer various sorting options with a large number of elements. If not, then I typically go with unstructured.

That’s brilliant, and it’s so true. Understand that structuring content can suck the soul out of the authoring process for a lot of people. Like Josh said, often the advantages are clear enough to justify some soul-sucking, but always approach this with care.

I remember a client for whom we were building a “case studies” section of their Web site. I kept trying to get them to structure the case studies. I would say things like:

If you kept your case studies in an Excel spreadsheet, and each row was a case study, what would the column headings be?

Now, this is a good question and one that’s worked well for me in the past, but this client was just not getting it. Finally, Joe said, “Dude, I think they just want a page…” And he was right. The client wasn’t thinking in terms of structure — they were thinking in terms of a page with stuff on it. The figured they could just WYSIWYG it up, and in the end, they did it this way and they were fine.

Postscript: So, what are we going to do with our original example? At this point, I can’t say, but I’m leaning away from pure WYSIWYG because of the image processing. If we get this client on eZ publish, I imagine we’ll do it in separate records because eZ can’t repeat sub-elements within the same record. If we were to go with a CMS that allowed that, then my inclination would be to do as a structured single record.

Gadgetopia
What This Links To
What Links Here

Comments

  1. I think the most important questions you ask are, “If your answer to the above is “no,” how sure are you of that? How well does the end-user understand the risk they’re taking by not making the content sub-dividable?”. In every project I’m involved with (whether it is for me or someone else), I prefer overkill. I’d rather have features available but not used instead of lacking the needed features down the road.

    What I find difficult is not structuring the content, but structuring it in a way that is reasonable for the client. This would go with one of your other questions, “What is the technical sophistication of the end-user? ” Unfortunately, most end-users do not have the technical sophistication or know enough to be dangerous. I designed a site once for an investment consultant and we both agreed on the need of structure for the content. What was never really understood though by me or him though was how best to structure the content. My solution worked for him, but I don’t think he was ever fully pleased with the solution I came up with. He wanted it to be structured in the way he did business, but since he didn’t do computers and I didn’t do investment…we never really came up with a perfect solution.

  2. Thanks for the useful post. Differences in how to structure content is definitely tough to discuss with non-technical site owners, and I think one way to discuss it is to present a bunch of different types of reports/screens that could be generated with varying levels of structure. Drawing out abstract structures is much less effective than describing the different outputs that are possible with different structures. I also do think that we can end up over-complicating/structuring systems for our users, and for large CMS implementations (100,000+ pieces of content) often the more straightforward approach is better. Another consideration: if ithe input system gets to cumbersome/confusing/abstract for a user, they either may not use it or start putting in lower quality content/metadata.

  3. I think the problem of subcontent structure is more common than the example in this article. I’ve noticed that i have this problem with links on my pages – all times I simply put tags in the content, but if I move some article [or change its address in whatever way], that link is broken and I have to find it on all pages and repair.

    None CMS I know treats in-content links any special way. But I wonder if it would be possible to make that links automated – if I change one, it’ll change in all places where I linked it in my content. [unbrokable links? hehe :-)].What if we treat link as an element of content too? [subcontent]

    The same goes for inline images. Let’s assume I have some image file on a server and incorporated in few pages of content. Next I move my site to other hosting server. It is possible to break the images that way – e.g. if the image file wasn’t stored in the database but in a filesystem directory. Or maybe I moved the image accidentaly to another directory, or changed its name. Then it’s broken in all the places where it appears in content, because that tag was refering the specific file on the server. So, even if I have that image as a subcontent in a content tree, CMS doesn’t know where in the content it’s referred [in what place exactly], so it can only know the page, but not the exact place. If the image location on the filesystem change, all src=”..” attributes need to be changed manually.

    There should be some way to tell the CMS that “here you should place that image” and when the image location will change, it should be able to change the generated tags automagically without the need of finding it and editting in the content pages manually.

    Any ideas?

Comments are closed. If you have something you really want to say, email editors@gadgetopia.com and we‘ll get it added for you.