What is Content Integration?

By on April 27, 2015

Since I don’t feel there’s a good, all-encompassing name out there for this, I’m going to attempt to invent one –

Content Integration encompasses the philosophy, theories, practices, and tools around the re-use and adaption of content from our core repository into other uses and channels, or vice-versa: the creation and ingestion of content from other channels into our core repository.

Traditionally, we create content and store it in a repository. In many cases, this repository is also a delivery channel. A web content management system (WCMS) is the perfect example – we create the content in the WCMS, store it there, and deliver it from there. In many cases, our content stays entirely locked within the bounds of our WCMS. The entire lifecycle of that content—creation, management, delivery, archival, and deletion—happens inside of that system.

Content Integration would be the process by which we connect to content in that repository and use it in some other way. Content Integration occurs every time we connect a content-based system to the “outside world” to take in or push out content to other systems to allow for creation or consumption by other means.

For example –

  • We create an announcement for our company intranet. We also want to email this announcement without having to create separate content for the email.
  • We have four corporate websites, each running on a different CMS. We have a single Privacy Policy that is reviewed, modified, and re-published edited by our legal department once a quarter. When this happens, the text of the policy should be pushed out to each website automatically.
  • Employees of our company submit Improvement Suggestions via a Word document. These are reviewed, metadata is added via document properties, and items worthy of further discussion are moved into a separate location by an admin assistant. Files in this location need to be consumed and automatically published to the Improvement Committee section of our intranet.
  • Our latest financial projections need to be published to the investor relations section of our website, and to seven different reporting services. Each service has slightly different formatting and composition requirements, so our financial projection content has to able to adapt to each one.
Content management vendors tend to silently wage war against Content Integration by adding features to their systems in an effort to remove the need to go “outside” that system. In the first example above, WCMS vendors often built entire email messaging platforms into their systems to allow for this functionality in addition to the core web publishing.

This is done in the name of sales demos and competitive advantage, but weakens the product overall because no vendor can ever predict all the possible ways content can be re-used. (While it’s easy to blame vendors, the guilt can probably be laid at the feet of their customers, who—being ignorant of the concepts of Content Integration—have historically equated “built-in” with “superior.”)

To circle back to the original definition, Content Integration is multi-disciplinary. It encompasses:

  • Philosophy: How do we adopt the mindset that content is divorced from channel?  That message and medium are not the same thing, and a message can be carried over multiple media? How do we evangelize this philosophy to the entire organization?
  • Theories: What are the core paradigms of working with content? What is content, itself? What is a repository?  What is a channel?
  • Practices: How do we design content for integration? How do we manage it in such a way that it can be re-used? What governance and workflow situations arise from the usage of content in multiple locations?
  • Tools: What type of repository allows us to integrate our content easily? What channel products and services are designed for content integration? What content management systems allow for the easy import/export of content for re-use?
In the end, Content Integration is an umbrella which falls over a collection of knowledge and technology, the combination of which allows us to get more value out of our content – to reach greater numbers of content consumers, at less cost, with greater control, and less risk.


RSG WCM Survey

By on February 10, 2015

Tony and the crew from Real Story Group have embarked on a broad survey of WCM usage and implementation patterns, which I think is worth taking.  The survey is here:

Survey: Web Content & Experience Management

I don’t think enough of this happens in the industry. As a group, we lack in self-reflection and reporting.  Some of the questions are so basic, yet incredible opaque from the outside.

If you complete the survey, you can elect to get a summary of the results. That alone makes it worthwhile.


Editorial Scripting in CMS

By on January 29, 2015

For years, I’ve been quite interested in the idea of scripting within a CMS.  By “within,” I mean scripting inside of managed content.  So, using some taught language or declarative syntax to get the CMS to perform actions to publish content.

This clearly sounds weird, so here’s an example –

Say we have an editor who wants to display a dynamic table of data on a page.  This is data that comes from some DB-ish datasource outside the CMS.  Perhaps a list of locations, or something else.

Conventional practice gives us two options: we could (1) bring this data into the CMS itself, as managed content; or (2) we could leave the data where it is, and create custom code that connected to the database at the CMS level, retrieved and formatted the information.  Of course, either way will require us to do the dreaded “custom development” on our CMS implementation.

But is there perhaps another way?

Could we perhaps create a content type called “SQL Recordset” which contains an editor-controlled SQL query.  When this content renders, the SQL query is executed against a datasource, and the results are displayed as content.  The end consumer doesn’t know the actual “content” is the SQL query that generated it, but that’s not important.  Sure, our editor would have to understand basic SQL (only as it relates to this problem) and the structure of the datasource, but let’s pretend this is feasible.

Could we take this a step further by allowing the editor to supply a template, which is HTML with templating controls (a la Smarty, or Twig, or DotLiquid, something) and apply that template to the recordset.  Or return the SQL results as XML, and transform it with XSL (ewwwww, I know…but it works). The resulting HTML might not even look like a SQL recordset – hell, a competent editor might make it come out as a blog.  Essentially, they’re content-managing scripts, which are executed at request time.

Now, before you freak out (it’s probably too late), let me explain the reasons why this interests me –

First, there are different types of editors.  There are “normal” editors that just want to create content by filling out forms, and then there are “power editors,” who want as much control as they can get.  They’re not full-blown developers, but they have some concept of programming principles, enough that you could teach them a simple language and have them get results without them tying up a developer with a bunch of requests.

Second, there are different types of content problems. There are problems so foundational that you need a developer to solve them. But there are other problems which are just not that complicated, very idiosyncratic to the editor/content (meaning you’re not going to need to solve the same problem every day), and perhaps you just don’t need to re-build and re-deploy your CMS implementation to solve them.

You want to display the weather in Moscow on your intranet page?  Well, this is not a common request, so I’m not going to build a framework for it, and you’re just some random dude in the organization, so you don’t have the right to tell me to develop this and re-deploy the app.  But what if there was a simple scripting language inside your CMS which would enable you to make a call to the Open Weather Map API, extract the data you want, format the results and inject it into a content-managed page?  Would that work?

Third, even if I’m a trained developer, some problems are so simple that perhaps we should solve them at a level that doesn’t require us to mess with the “foundational” code of the implementation.  What if we split our implementation into “foundational” and “editorial” layers, and decided that we could solve some problems in the editorial layer?

For a highly dynamic implementation (think intranet), perhaps the core CMS implementation itself is more of a framework, and we have an embedded scripting container to solve highly specific, one-off problems at the content/editorial level, rather than the code level.  Perhaps there can be another category of lightweight developer that can solve simple problems that editors have without having to escalate to a “full” developer?

Yes, there are numerous issues here, and the idea of editors have access to a programming environment is a little scary, but I’m curious to see how viable this is.  To what extent could editors be trained on, understand, and use some simple scripting tool?

Lately, I’ve been playing with some ideas.

  • The first one is a “text filter pipeline” (it really needs a better name) which grew out of the development of a simple file include-ish feature for EPiServer.  An early version is on GitHub. The idea is an extremely simple scripting-ish language that editors can use to inject external data into managed content.  I’ve kept the language as simple as possible, while still making it fairly powerful and extensible.  It’s still very much in development, but take a look at the README for an example of what I’m talking about (and a working example of the “weather” scenario I mentioned above).
  • The second one is straight up server-side JavaScript injection.  I’m playing around with Jurassic, and I have a prototype of server-side JavaScript executed at request time within EPiServer HTML content (technically, in a SCRIPT tag with a “type” of “text/server-js”).  The difficulty is exposing a read-only EPiServer API to the JavaScript, but I’m getting there.  It’s quite possible, and ECMAScript 3 would give an editor an essentially Turing-complete language in which to do…stuff.
Yeah, yeah – a lot of you are freaking out right now.  I get it, and I’m not saying this isn’t fraught with potential security, training, and governance issues.  But it’s interesting as hell, and I’m determined to see just how viable it is from a practical standpoint.

Also, I know that this isn’t new.  I have seen things like this before (DekiScript for MindTouch, for instance).  I don’t think I’ve seen it done really well, and perhaps there’s a reason for that.

Even if this doesn’t work out how I hope it will, I stand to learn a lot about the average CMS editor, what they want, and where their threshold of complexity lies.

Stay tuned.


Accidental Bitcoin Centralization

By on January 23, 2015

Blockchain scalability: As Bitcoin gets bigger, the history of transactions (which is required to make the whole thing work) gets less manageable, leading to centralization, which is the anti-thesis of the whole idea.

We can already observe empirically that more than 50% of the hashpower securing the network right now is owned by just five entities – see figure 1. This is a real security threat. Five is a small enough number that state-level actors could directly coerce all five entities without too much trouble. Five is also small enough that active collusion would be fairly easy to coordinate.

Is not getting better.

The bitcoin blockchain is presently about 25 GB in size. Downloading the blockchain peer-to-peer takes about 48 hours, and of course 25 GB of disk space. This is a serious user experience flaw…


We Suck at HTTP

By on January 7, 2015

I absolutely loved this New York Times column which lamented the world of apps, where we don’t have the capability to link to content anymore:

Unlike web pages, mobile apps do not have links. They do not have web addresses. They live in worlds by themselves, largely cut off from one another and the broader Internet. And so it is much harder to share the information found on them.

Yes, yes, for the love of God yes.

We have broken HTTP.  We’ve done it for years in fits and starts, but apps have completely broken it.  HTTP was a good specification which we’ve steadily whittled away.

URLs have a purpose.  We are very cavalier about that purpose. We don’t use canonicals. We’re sloppy about switching back and forth between HTTP and HTTPs.  We don’t bother to logically structure our URLs.  We rebuild websites and let all the links break. We don’t appreciate that crawlers are dumb and they need more context than humans.

Did you know there’s something called a URN – Uniform Resource Name?  This was supposed to be one level above a URL.  Your resource would have a URN, which would be a global identifier, and it would resolve to a URL which was just where the resource was located right now.  URNs never caught on, but they web would be better if they had.  Content could then have a “name” which was matched to it forever, regardless of its current URL.  (The “guid” element in RSS probably should have been named “urn,” in fact.)

And it’s not just URLs.  HTTP status codes exist for a reason too.  Did you know that there are a lot of them?  In fact, there’s one for about everything that could happen for a web request.  Did you know there’s a difference between 404 and 410?  404 (traditionally “Not Found”) means it was never here.  410 (traditionally “Gone”) means it was once here but is now gone.  Big difference.

Ever hear of 303 and 307?  They’re meant for load redirects (mirrors).  The human readable descriptions are usually “See Other” or “Temporary Redirect.”  Did you know there was a “402 Payment Required”?  There’s a bunch that were just never implemented. These days a lot of websites just return “200 OK” for everything, even 404s, which drives me freaking nuts.  (And, yes, I’m sure I’ve done it, so don’t go looking too hard through my portfolio…)

(A new company called Words API (it’s an API…for words) made me jump for joy when I saw they are using actual, intelligent HTTP status codes on their responses, even their errors.  If you go over your usage limit, for example, you get a “429 Too Many Requests ” back. Good for them.)

Do you know why your FORM tag has an attribute called “method”?  Because you’re calling a method on a web server, like a method on an object in OOP.  Did you know there are other methods besides GET and POST?  There’s HEAD and OPTIONS and PUT and DELETE.  And you can write your own.  So if you’re passing data back and forth between your app/site and your web server, you’re welcome to name custom methods in the leading line of the header.

And, technically, you’re supposed to make sure GET requests are idempotent, meaning they can be repeated with no changes to a resource.  So you should be able to hit Refresh all day on a GET request without causing any data change (beyond perhaps analytics).  If you’re changing data on a server, that should always be a POST request (or PUT or DELETE, if anyone ever used them as intended).

I could go on and on.  Don’t even get me started about URL parameters. No, not querystrings – there was originally a specification where you could do something like “/key1:value1/key2:value2/” to pass data into a request. And what about the series of “UA-*” headers that existed to tell the web server information about the rendering capabilities of the user agent?  (And dare I wander off into metadata-related ranting…two words people, Dublin Core!)

My point is that a lot of web developers today are completely ignorant of the protocol that is the basis for their job.  A core understanding of HTTP should be a base requirement for working in this business.  To not do that is to ignore a massive part of digital history (which we’re also very good at).

I’m currently working through HTTP: The Definitive Guide by O’Reilly.  The book was written in 2002, but HTTP hasn’t changed much since then.  It’s fascinating to read all the features built into HTTP that no one uses because they were never adopted or no one bothered to do some research before they re-solved a problem. There’s a lot of stuff in there that solves problems we’ve since programmed our way around.  The designers of the spec were probably smarter than you, it turns out.

(HTTP/2 is currently proposed, but it doesn’t change much of the high level stuff.  The changes are mostly low-level data transport hacks, based on Google’s experience with SPDY.)

At risk of sounding like a crabby old man (I’m 43 and have been developing for the web since 1996), this is one small symptom of a larger problem – developers tend to think they can solve every problem, and they’re pretty sure that nothing good happened before they arrived on the scene. Anyone working in this space 20 years ago couldn’t possibly have known of their problems so every problem deserves a new solution.

Developers often don’t know what they don’t know (that link goes to my personal confession of this exact thing), and they feel no need to study the history of their technology to gain some context about it.  Hell, we all need to sit and read The Innovators together.

Narcissism runs rampant in this industry, and our willingness to throw away and ignore some of the core philosophies of HTTP is just one manifestation of this.  Rant over.


America’s CTO

By on January 5, 2015

Adviser Guides Obama Into the Google Age: A profile of the new U.S. CTO Megan Smith. The transition is rocky:

Not only does she now carry a BlackBerry, she uses a 2013 Dell laptop: new by government standards, but clunky enough compared with the cutting-edge devices of her former life that her young son asked what it was.

Additionally, the position is suspected of being nothing but a figurehead.

The problem, technology experts say, is that the mandate of the chief technology officer has been nebulous since Mr. Obama created the job five years ago, not least because it does not come with a substantial funding stream, a crucial source of power in the government.

No money, no power.


Qualcomm: The Monopoly You’ve Never Heard Of?

By on January 5, 2015

The title of this post has nothing to do with this point, which I found interesting:

[…] no one seems to be paying attention to Qualcomm’s incredible chipset dominance in mobile. Android and Snapdragon look an awful lot like Windows and Intel; every hardware maker except for Apple is beholden to the two giants behind the platform. (And even Apple uses Qualcomm’s LTE radios and other chips in the iPhone.) Qualcomm has an incredible patent moat and it seems to be pushing Snapdragon along right on schedule, but that’s exactly where Intel was before the smartphone explosion sent its roadmap spinning wildly off course. The coming explosion of internet of things devices, wearables, and other sensor-laden gadgets is a huge opportunity for every company that failed in mobile, and a dangerous moment for Qualcomm.


Crypto-Hacking Case Study

By on January 4, 2015

How My Mom Got Hacked: Interesting look at a case of crypto-hacking. Turns out that actually paying the ransom was the difficult part.

By the time my mom called to ask for my help, it was already Day 6 and the clock was ticking. (Literally — the virus comes with a countdown clock, ratcheting up the pressure to pay.) My father had already spent all week trying to convince her that losing six months of files wasn’t the end of the world (she had last backed up her computer in May). It was pointless to argue with her. She had thought through all of her options; she wanted to pay.


It’s a Young Developer’s World

By on December 24, 2014

Silicon Valley’s Youth Problem: Silicon Valley and start-up culture is dysfunctional.  So many great quotes in this piece.

[…] what matters most is not salary, or stability, or job security, but cool. Cool exists at the ineffable confluence of smart people, big money and compelling product. You can buy it, but only up to a point. For example, Microsoft, while perpetually cast as an industry dinosaur, is in fact in very good financial shape. Starting salaries are competitive with those at Google and Facebook; top talent is promoted rapidly. Last year, every Microsoft intern was given a Surface tablet; in July, they were flown to Seattle for an all-expenses-paid week whose activities included a concert headlined by Macklemore and deadmau5.  Despite these efforts, Microsoft’s cool feels coerced.

As a guy pushing 44-years-old, this resonates:

If you are 50, no matter how good your coding skills, you probably do not want to be called a “ninja” and go on bar crawls every weekend with your colleagues, which is exactly what many of my friends do.

And what are we losing for the world when the top technical talent wants to work at companies that do – let’s face it – stupid, meaningless things?

Why do these smart, quantitatively trained engineers, who could help cure cancer or fix healthcare.gov, want to work for a sexting app?

[sigh] I’m old.


Bitcoin for the Befuddled

By on December 7, 2014

bitcoin-for-the-befuddled-coverI got a review copy of “Bitcoin for the Befuddled” from No Starch Press (a publisher I’ve really enjoyed over the years).  The title is an accurate description of where I’m at on Bitcoin – I have a basic understanding of it, but the intricacies are escaping me.

Sadly, Duning-Kruger being what it is, reading the book left me more confused, but most likely in a good way – at least now I can grasp the scope and depth of what I don’t understand.  I’m in a better position to figure things out after reading the book, even if I don’t have all the answers now. (You can’t Google “blockchain” without knowing that the concept exists and is “Google-able”…)

Here’s a random sample of what I learned (in an attempt to cement my own knowledge, if nothing else):

  • Bitcoin is a weird thing. Creating it was almost a magic discovery of some intersection between cryptography, economics, and game theory. The entire thing seems both precarious and stable, like a perfectly interlocking house of cards. It’s like a Mexican Standoff that works to everyone’s benefit.
  • The guy (girl?) who “discovered” it did so in a 2008 paper anonymously published to a cryptography mailing list. He is known as “Satoshi,” but no one knows who he really is. He hasn’t been heard from in years. (Sounds like a movie plot, I know.)  He claimed to be a 37-year-old man from Japan, but many people don’t believe that.
  • The core logical basis of Bitcoin is that the ledger – the entire history of transactions – is public and everyone has it. So everyone can recreate the entire history of Bitcoin transactions, and everyone confirms that it’s valid every 10 minutes. With this process in place, no one can cheat the system because the entire history of the currency is in the open.
  • This ledger is known as “the blockchain,” which is a – wait for it – chain of blocks, which are packets of information . Every block has a hashcode from the block before it.  Think about that, for a second – if each block verifies the one before it, then you can trace the validity of the chain backwards to the very first block/transaction (the “genesis block”).  The validation of the last block in the chain (they generate every 10 minutes or so), effectively validates the entire chain. That is just elegantly beautiful.
  • There’s an astonishing amount of cryptography involved. Without crazy math, Bitcoin wouldn’t exist.
  • There’s almost an equal amount of game theory involved.  There are weaknesses in the system that are covered by other strengths which make pursuing the weaknesses non-profitable and therefore pointless.
  • Storing Bitcoins can be a potentially complicated thing, depending on how much you have to store and how secure you want it. It can involve offline computers, “hot” or “cold” wallets, and even…
  • …paper. Bitcoins can be stored on paper. So long as you can codify a cryptographic key value as a QR code, then there’s nothing stopping you from printing out $1 billion in Bitcoins, throwing away all digital record of it, and turning your filing cabinet into the most valuable piece of furniture in history.
I’m still fuzzy on some other things:
  • How big does the blockchain get before its unwieldy?  The chain is something like 20GB now, which means a full copy of it might take days to download to populate your wallet.  This will only get worse as Bitcoin gets more and more popular. (I did some research – people have put a lot of thought into this.)
  • Doesn’t the system require Internet access?  This point seems obvious, but I’m wondering how much of limitation this would be with wider adoption.
  • How do nodes on the Bitcoin network connect? This is glossed over – nodes just seem to magically find each other.  I assume they connect over some standard port/protocol, but then wouldn’t that be ripe for DOS attack? I know the book wasn’t intended as a networking reference, but at least some information on that would have helped visualize it my head.
Overall, the book is well-done.  It does a nice job of gauging where you might be at, and attacking the problem from multiple sides.  Bitcoin is a frustratingly slippery thing – you think you have it figured out, then the zen of it falls out of your head for a second and you have to fight to get it back. I’m sure there a people for whom this is all crystal clear, but I am not one of them.

The book has narrative sections, and interestingly, a full-length comic book right in the middle of it. There’s a chapter about the cryptographic basis of Bitcoin that had a lot  of math and graphs. I admit to skimming that one a bit.  There’s also perhaps an over-abundance of analogies. You start confusing them for each other after a while.

(Also, weirdly, the book is full of typographical errors.  I found three of them in a two-paragraph stretch, at one point.)

All in all, the book fulfilled its promise.  I was befuddled.  I’m still a little confused, but I’m light years ahead of where I was. Let’s call this book a primer – it gets you started, and gives the basic knowledge required to learn more, if that’s what you decide to do. Honestly, I feel like I probably know enough at this point to trust Bitcoin and perhaps become a user of it.  If I ever decided to mine it (something the book highly discourages) or develop against it, I would clearly need to know more.

But, for now, this is enough.

(If you want to see the first two chapters, which is where some of the core theory lies, the page at No Starch has a free download.)