The Flying Squirrel Book

By on July 14, 2015

The Flying Squirrel BookFor the last few months, I’ve been working on a book for O’Reilly: Web Content Management: Systems, Features, and Best Practices.  It’s a reasonable distillation of everything I’ve learned about web content management in almost two decades of success and failure (hopefully more of the former than the latter…)

O’Reilly has an interesting system of pre-order, whereby you can buy the book and get a series of e-book chapters as its written. They’re rough and not finished, but you can start the reading the book before it’s even published. My book has had two sets of chapters released — 1-4 and 5-7 — and is due for 8-10 in a couple of weeks.  (I’m not sure if there will be another, since the entire book is tentatively planned at 14 chapters).  You can pre-order here.

O’Reilly books are well-known for their covers of hand-drawn animals.  To my dismay, I learned that you don’t get to pick — the O’Reilly art department instead proposes designs to you, and you have a limited ability to object. They first came up with a dog, which absolutely wasn’t going to happen (I don’t like dogs).  Then a bird, which just didn’t seem right either.  Then my editor called and said, “Look, we can go back to them one more time, but you’re gonna have to make one of the three work.”

God was apparently smiling on me, because that third animal was the pygmy flying squirrel, which frankly couldn’t be more awesome.  Thus, my book’s website: The Flying Squirrel Book.

The process of writing has been more difficult than I expected. Finding the time is a huge challenge, but the hardest part has been fighting the image of perfection in my head. I’ve waited so long to write this book, that I avoid writing because I’m afraid the actual words won’t live up to my expectations.

Additionally, “walking back the knowledge” has been challenging.  I’ll go to explain Concept A, then get a few pages in before I realize that I’m not explaining Concept A, but Concept D. I actually have to back up and explain Concepts A, B, and C first.  Honestly, you don’t realize what you actually know until you have to write it all down.

The mechanics of writing have been fun.  O’Reilly has a system they call Atlas (alas, not available commercially yet), which allows to me to write in a Markdown-ish format called AsciiDoc.  I maintain these in a Git repo.  These are then converted to HTML (I could write directly in HTML, if I wanted).  The HTML is then turned into all sorts of formats, in almost real-time.  I can “build” the book into print-ready PDF, MOBI, HTML, etc. in seconds.

So, that’s what I’ve been working on since last year. With a bit of luck, the book will be out in the fall.  Once I finish writing, we go to “pre-production,” where it gets copy-edited and indexed, then apparently I get a box in the mail with a bunch of books.

That’ll be a good day.


The Limitations of Screen Reading

By on July 9, 2015

Everything Science Knows About Reading On Screens: More proof that reading on the screen is not the same as reading on paper.

But this style of reading may come at a cost—Liu noted in his study that sustained attention seems to decline when people read onscreen rather than on paper, and that people also spend less time on in-depth reading. “In digital, we can link in different media, images, sound, and other text, and people can get overwhelmed,”

[...] The researchers found that when people read short nonfiction onscreen, their understanding of the text suffered because people managed their time poorly compared with when they used paper.

There’s even a slight difference between reading on paper and reading on a Kindle.
Mangen explains that the tactile feedback of paper may help people process certain information when they read, and this may be lost when we move to digital texts.


Just My Type

By on June 27, 2015

This is a book about fonts (or type or typefaces — I’ve learned there are subtle variations in the definitions of each, but I can’t remember what they are). The book is a series of anecdotes about fonts/types. Each chapter is short — you can read one in 3-4 minutes, but they’re all pretty entertaining.  Some things I learned:

  • Type fans are kind of a cult.  There’s an entire word of type designers and fans that you don’t even know exist. They freak out when a font changes somewhere, and they have massive arguments and flame wars on the internet about what font a particular company uses. The worst thing to hear in these situations is “Verdict: Not a Font,” which means the type was something designed specifically for the logo or service mark.
  • A font company is called a “foundry.”  There are some legendary foundries over the years, like  International Typeface Corporation (ITC), the initials of which appear before a lot of their fonts.
  • There is a character called an “interrobang,” which is a combination of a an exclamation point and question mark, meant for things like “You did what!?”  A guy came up with it in the 1960s, but it never really caught on.
  • There’s a huge war between Arial and Helvetica.  There’s even a funny College Humor video where their respective gangs meet for a rumble (the last 15 seconds is pretty funny).
  • Universal also kind of screwed over Helvetica, because it was free.  So people stopped buying Helvetica.
  • Microsoft has had a pretty significant influence on fonts, given the ubiquity of Microsoft products.  Microsoft commissioned Tahoma, Georgia, Verdana, and Calibri. Microsoft is also responsible for the victory of Arial, due to its default inclusion in Microsoft products. The same is true of Times New Roman, which was commissioned in 1931 for the Times newspaper, but has lived on because Microsoft has bundled it in with their products for years.
  • Gill Sans is named for Eric Gill, who designed it and was, incidentally, a sexual pervert.  He apparently regularly molested his daughters and experimented sexually with the family dog.
  • The Nazis outlawed Gothic script in 1941, believing it to be too associated with the Jews.
  • The testing word that type designers favor when they test their fonts is “handgloves,” because of the way the different letters interact and its unqiue kerning properties.
  • There is software just for designing fonts.  For example “Fontographer,” by Fontlab.
  • There is debate as to the origin of the word “font.”  Some think it derives from “fund,” as in a “fund of letters,” on which the printer would draw.  Others think it comes from the French word “fonte” which means “cast,” because the letters were originally cast in lead.
  • In 1977, a British newspaper created  a pretty funny April Fools Day hoax, using font terminology to create a fictional island which supposedly was marking 10 years of independence.
  • Gotham is a font well-known for being used on the iconic Obama 2008 campaign posters.
  • Johnston Sans is legendary for being used on signage in the London Underground for decades.
  • It’s not possible to copyright a font.  You’d have to copyright each individual letter and symbol, which is prohibitive. Thus, derivation in the font world is common, and some of the great type designers have died penniless after being unable to support themselves.
The entire book is full of these stories.  It’s well-written and frequently funny.  It does occasionally get into touchy-feely emotional talk about the design characteristics of fonts, which I didn’t quite get, but designers will no-doubt love it.  Absolutely worth the read, and so much better than the other book I read about fonts last year: Stop Stealing Sheep & Find Out How Type Works (here’s my Goodreads review of that book, which I really disliked).



Author Payment by the Page

By on June 22, 2015

Amazon Kindle Direct Publishing: Interesting times:

Under the new payment method, you’ll be paid for each page individual customers read of your book, the first time they read it.
Of course, however, there are debates about what a “page” means, when it comes to ebooks.
To determine a book’s page count in a way that works across genres and devices, we’ve developed the Kindle Edition Normalized Page Count (KENPC). We calculate KENPC based on standard settings (e.g. font, line height, line spacing, etc.), and we’ll use KENPC to measure the number of pages customers read in your book, starting with the Start Reading Location (SRL) to the end of your book. Amazon typically sets SRL at chapter 1 so readers can start reading the core content of your book as soon as they open it.
See: eBooks and the Vanishing Concept of the Page



Creating Fake Facebook Accounts

By on June 22, 2015

Inside a counterfeit Facebook farm: This is the process a “Facebook Account Mill” goes through to create a new account.  I found it fascinating.
She starts by entering the client’s specifications into the website Fake Name Generator, which returns a sociologically realistic identity: Ashley Nivens, 21, from Nashville, Tennessee, now a student at New York University who works part-time at American Apparel. Casipong then creates an email account. The email address forms the foundation of Ashley Nivens’ Facebook account, which is fleshed out with a profile picture from photos that Braggs’ workers have scraped from dating sites. The whole time, a proxy server makes it seem as though Casipong is accessing the internet from Manhattan, and software disables the cookies that Facebook uses to track suspicious activity. Next, Casipong inserts a SIM card into a Nokia cellphone, a pre–touch screen antique that’s been used so much, the digits on its keypad have worn away. Once the phone is live, she types its number into Nivens’ Facebook profile and waits for a verification code to arrive via text message. She enters the code into Facebook and — voilà! — Ashley Nivens is, according to Facebook’s security algorithms, a real person. The whole process takes about three minutes.
Interesting how email is the bedrock of the process. Increasingly, everything is tied to an email account. For its part, Facebook knows this is an issue:
This February, Facebook stated that about 7 percent of its then 1.4 billion accounts were fake or duplicate, and that up to 28 million were “undesirable” — used for activities like spamming. In August 2014, Twitter disclosed in filings with the Securities and Exchange Commission that 23 million — or 8.5 percent — of its 270 million accounts were automated.
I also quietly mourn for a culture in which this is a thing that has to happen.


What is Content Integration?

By on April 27, 2015

Since I don’t feel there’s a good, all-encompassing name out there for this, I’m going to attempt to invent one –

Content Integration encompasses the philosophy, theories, practices, and tools around the re-use and adaption of content from our core repository into other uses and channels, or vice-versa: the creation and ingestion of content from other channels into our core repository.

Traditionally, we create content and store it in a repository. In many cases, this repository is also a delivery channel. A web content management system (WCMS) is the perfect example – we create the content in the WCMS, store it there, and deliver it from there. In many cases, our content stays entirely locked within the bounds of our WCMS. The entire lifecycle of that content—creation, management, delivery, archival, and deletion—happens inside of that system.

Content Integration would be the process by which we connect to content in that repository and use it in some other way. Content Integration occurs every time we connect a content-based system to the “outside world” to take in or push out content to other systems to allow for creation or consumption by other means.

For example –

  • We create an announcement for our company intranet. We also want to email this announcement without having to create separate content for the email.
  • We have four corporate websites, each running on a different CMS. We have a single Privacy Policy that is reviewed, modified, and re-published edited by our legal department once a quarter. When this happens, the text of the policy should be pushed out to each website automatically.
  • Employees of our company submit Improvement Suggestions via a Word document. These are reviewed, metadata is added via document properties, and items worthy of further discussion are moved into a separate location by an admin assistant. Files in this location need to be consumed and automatically published to the Improvement Committee section of our intranet.
  • Our latest financial projections need to be published to the investor relations section of our website, and to seven different reporting services. Each service has slightly different formatting and composition requirements, so our financial projection content has to able to adapt to each one.
Content management vendors tend to silently wage war against Content Integration by adding features to their systems in an effort to remove the need to go “outside” that system. In the first example above, WCMS vendors often built entire email messaging platforms into their systems to allow for this functionality in addition to the core web publishing.

This is done in the name of sales demos and competitive advantage, but weakens the product overall because no vendor can ever predict all the possible ways content can be re-used. (While it’s easy to blame vendors, the guilt can probably be laid at the feet of their customers, who—being ignorant of the concepts of Content Integration—have historically equated “built-in” with “superior.”)

To circle back to the original definition, Content Integration is multi-disciplinary. It encompasses:

  • Philosophy: How do we adopt the mindset that content is divorced from channel?  That message and medium are not the same thing, and a message can be carried over multiple media? How do we evangelize this philosophy to the entire organization?
  • Theories: What are the core paradigms of working with content? What is content, itself? What is a repository?  What is a channel?
  • Practices: How do we design content for integration? How do we manage it in such a way that it can be re-used? What governance and workflow situations arise from the usage of content in multiple locations?
  • Tools: What type of repository allows us to integrate our content easily? What channel products and services are designed for content integration? What content management systems allow for the easy import/export of content for re-use?
In the end, Content Integration is an umbrella which falls over a collection of knowledge and technology, the combination of which allows us to get more value out of our content – to reach greater numbers of content consumers, at less cost, with greater control, and less risk.


RSG WCM Survey

By on February 10, 2015

Tony and the crew from Real Story Group have embarked on a broad survey of WCM usage and implementation patterns, which I think is worth taking.  The survey is here:

Survey: Web Content & Experience Management

I don’t think enough of this happens in the industry. As a group, we lack in self-reflection and reporting.  Some of the questions are so basic, yet incredible opaque from the outside.

If you complete the survey, you can elect to get a summary of the results. That alone makes it worthwhile.


Editorial Scripting in CMS

By on January 29, 2015

For years, I’ve been quite interested in the idea of scripting within a CMS.  By “within,” I mean scripting inside of managed content.  So, using some taught language or declarative syntax to get the CMS to perform actions to publish content.

This clearly sounds weird, so here’s an example –

Say we have an editor who wants to display a dynamic table of data on a page.  This is data that comes from some DB-ish datasource outside the CMS.  Perhaps a list of locations, or something else.

Conventional practice gives us two options: we could (1) bring this data into the CMS itself, as managed content; or (2) we could leave the data where it is, and create custom code that connected to the database at the CMS level, retrieved and formatted the information.  Of course, either way will require us to do the dreaded “custom development” on our CMS implementation.

But is there perhaps another way?

Could we perhaps create a content type called “SQL Recordset” which contains an editor-controlled SQL query.  When this content renders, the SQL query is executed against a datasource, and the results are displayed as content.  The end consumer doesn’t know the actual “content” is the SQL query that generated it, but that’s not important.  Sure, our editor would have to understand basic SQL (only as it relates to this problem) and the structure of the datasource, but let’s pretend this is feasible.

Could we take this a step further by allowing the editor to supply a template, which is HTML with templating controls (a la Smarty, or Twig, or DotLiquid, something) and apply that template to the recordset.  Or return the SQL results as XML, and transform it with XSL (ewwwww, I know…but it works). The resulting HTML might not even look like a SQL recordset – hell, a competent editor might make it come out as a blog.  Essentially, they’re content-managing scripts, which are executed at request time.

Now, before you freak out (it’s probably too late), let me explain the reasons why this interests me –

First, there are different types of editors.  There are “normal” editors that just want to create content by filling out forms, and then there are “power editors,” who want as much control as they can get.  They’re not full-blown developers, but they have some concept of programming principles, enough that you could teach them a simple language and have them get results without them tying up a developer with a bunch of requests.

Second, there are different types of content problems. There are problems so foundational that you need a developer to solve them. But there are other problems which are just not that complicated, very idiosyncratic to the editor/content (meaning you’re not going to need to solve the same problem every day), and perhaps you just don’t need to re-build and re-deploy your CMS implementation to solve them.

You want to display the weather in Moscow on your intranet page?  Well, this is not a common request, so I’m not going to build a framework for it, and you’re just some random dude in the organization, so you don’t have the right to tell me to develop this and re-deploy the app.  But what if there was a simple scripting language inside your CMS which would enable you to make a call to the Open Weather Map API, extract the data you want, format the results and inject it into a content-managed page?  Would that work?

Third, even if I’m a trained developer, some problems are so simple that perhaps we should solve them at a level that doesn’t require us to mess with the “foundational” code of the implementation.  What if we split our implementation into “foundational” and “editorial” layers, and decided that we could solve some problems in the editorial layer?

For a highly dynamic implementation (think intranet), perhaps the core CMS implementation itself is more of a framework, and we have an embedded scripting container to solve highly specific, one-off problems at the content/editorial level, rather than the code level.  Perhaps there can be another category of lightweight developer that can solve simple problems that editors have without having to escalate to a “full” developer?

Yes, there are numerous issues here, and the idea of editors have access to a programming environment is a little scary, but I’m curious to see how viable this is.  To what extent could editors be trained on, understand, and use some simple scripting tool?

Lately, I’ve been playing with some ideas.

  • The first one is a “text filter pipeline” (it really needs a better name) which grew out of the development of a simple file include-ish feature for EPiServer.  An early version is on GitHub. The idea is an extremely simple scripting-ish language that editors can use to inject external data into managed content.  I’ve kept the language as simple as possible, while still making it fairly powerful and extensible.  It’s still very much in development, but take a look at the README for an example of what I’m talking about (and a working example of the “weather” scenario I mentioned above).
  • The second one is straight up server-side JavaScript injection.  I’m playing around with Jurassic, and I have a prototype of server-side JavaScript executed at request time within EPiServer HTML content (technically, in a SCRIPT tag with a “type” of “text/server-js”).  The difficulty is exposing a read-only EPiServer API to the JavaScript, but I’m getting there.  It’s quite possible, and ECMAScript 3 would give an editor an essentially Turing-complete language in which to do…stuff.
Yeah, yeah – a lot of you are freaking out right now.  I get it, and I’m not saying this isn’t fraught with potential security, training, and governance issues.  But it’s interesting as hell, and I’m determined to see just how viable it is from a practical standpoint.

Also, I know that this isn’t new.  I have seen things like this before (DekiScript for MindTouch, for instance).  I don’t think I’ve seen it done really well, and perhaps there’s a reason for that.

Even if this doesn’t work out how I hope it will, I stand to learn a lot about the average CMS editor, what they want, and where their threshold of complexity lies.

Stay tuned.


Accidental Bitcoin Centralization

By on January 23, 2015

Blockchain scalability: As Bitcoin gets bigger, the history of transactions (which is required to make the whole thing work) gets less manageable, leading to centralization, which is the anti-thesis of the whole idea.

We can already observe empirically that more than 50% of the hashpower securing the network right now is owned by just five entities – see figure 1. This is a real security threat. Five is a small enough number that state-level actors could directly coerce all five entities without too much trouble. Five is also small enough that active collusion would be fairly easy to coordinate.

Is not getting better.

The bitcoin blockchain is presently about 25 GB in size. Downloading the blockchain peer-to-peer takes about 48 hours, and of course 25 GB of disk space. This is a serious user experience flaw…


We Suck at HTTP

By on January 7, 2015

I absolutely loved this New York Times column which lamented the world of apps, where we don’t have the capability to link to content anymore:

Unlike web pages, mobile apps do not have links. They do not have web addresses. They live in worlds by themselves, largely cut off from one another and the broader Internet. And so it is much harder to share the information found on them.

Yes, yes, for the love of God yes.

We have broken HTTP.  We’ve done it for years in fits and starts, but apps have completely broken it.  HTTP was a good specification which we’ve steadily whittled away.

URLs have a purpose.  We are very cavalier about that purpose. We don’t use canonicals. We’re sloppy about switching back and forth between HTTP and HTTPs.  We don’t bother to logically structure our URLs.  We rebuild websites and let all the links break. We don’t appreciate that crawlers are dumb and they need more context than humans.

Did you know there’s something called a URN – Uniform Resource Name?  This was supposed to be one level above a URL.  Your resource would have a URN, which would be a global identifier, and it would resolve to a URL which was just where the resource was located right now.  URNs never caught on, but they web would be better if they had.  Content could then have a “name” which was matched to it forever, regardless of its current URL.  (The “guid” element in RSS probably should have been named “urn,” in fact.)

And it’s not just URLs.  HTTP status codes exist for a reason too.  Did you know that there are a lot of them?  In fact, there’s one for about everything that could happen for a web request.  Did you know there’s a difference between 404 and 410?  404 (traditionally “Not Found”) means it was never here.  410 (traditionally “Gone”) means it was once here but is now gone.  Big difference.

Ever hear of 303 and 307?  They’re meant for load redirects (mirrors).  The human readable descriptions are usually “See Other” or “Temporary Redirect.”  Did you know there was a “402 Payment Required”?  There’s a bunch that were just never implemented. These days a lot of websites just return “200 OK” for everything, even 404s, which drives me freaking nuts.  (And, yes, I’m sure I’ve done it, so don’t go looking too hard through my portfolio…)

(A new company called Words API (it’s an API…for words) made me jump for joy when I saw they are using actual, intelligent HTTP status codes on their responses, even their errors.  If you go over your usage limit, for example, you get a “429 Too Many Requests ” back. Good for them.)

Do you know why your FORM tag has an attribute called “method”?  Because you’re calling a method on a web server, like a method on an object in OOP.  Did you know there are other methods besides GET and POST?  There’s HEAD and OPTIONS and PUT and DELETE.  And you can write your own.  So if you’re passing data back and forth between your app/site and your web server, you’re welcome to name custom methods in the leading line of the header.

And, technically, you’re supposed to make sure GET requests are idempotent, meaning they can be repeated with no changes to a resource.  So you should be able to hit Refresh all day on a GET request without causing any data change (beyond perhaps analytics).  If you’re changing data on a server, that should always be a POST request (or PUT or DELETE, if anyone ever used them as intended).

I could go on and on.  Don’t even get me started about URL parameters. No, not querystrings – there was originally a specification where you could do something like “/key1:value1/key2:value2/” to pass data into a request. And what about the series of “UA-*” headers that existed to tell the web server information about the rendering capabilities of the user agent?  (And dare I wander off into metadata-related ranting…two words people, Dublin Core!)

My point is that a lot of web developers today are completely ignorant of the protocol that is the basis for their job.  A core understanding of HTTP should be a base requirement for working in this business.  To not do that is to ignore a massive part of digital history (which we’re also very good at).

I’m currently working through HTTP: The Definitive Guide by O’Reilly.  The book was written in 2002, but HTTP hasn’t changed much since then.  It’s fascinating to read all the features built into HTTP that no one uses because they were never adopted or no one bothered to do some research before they re-solved a problem. There’s a lot of stuff in there that solves problems we’ve since programmed our way around.  The designers of the spec were probably smarter than you, it turns out.

(HTTP/2 is currently proposed, but it doesn’t change much of the high level stuff.  The changes are mostly low-level data transport hacks, based on Google’s experience with SPDY.)

At risk of sounding like a crabby old man (I’m 43 and have been developing for the web since 1996), this is one small symptom of a larger problem – developers tend to think they can solve every problem, and they’re pretty sure that nothing good happened before they arrived on the scene. Anyone working in this space 20 years ago couldn’t possibly have known of their problems so every problem deserves a new solution.

Developers often don’t know what they don’t know (that link goes to my personal confession of this exact thing), and they feel no need to study the history of their technology to gain some context about it.  Hell, we all need to sit and read The Innovators together.

Narcissism runs rampant in this industry, and our willingness to throw away and ignore some of the core philosophies of HTTP is just one manifestation of this.  Rant over.