Accidental Bitcoin Centralization

By on January 23, 2015

Blockchain scalability: As Bitcoin gets bigger, the history of transactions (which is required to make the whole thing work) gets less manageable, leading to centralization, which is the anti-thesis of the whole idea.

We can already observe empirically that more than 50% of the hashpower securing the network right now is owned by just five entities – see figure 1. This is a real security threat. Five is a small enough number that state-level actors could directly coerce all five entities without too much trouble. Five is also small enough that active collusion would be fairly easy to coordinate.

Is not getting better.

The bitcoin blockchain is presently about 25 GB in size. Downloading the blockchain peer-to-peer takes about 48 hours, and of course 25 GB of disk space. This is a serious user experience flaw…


We Suck at HTTP

By on January 7, 2015

I absolutely loved this New York Times column which lamented the world of apps, where we don’t have the capability to link to content anymore:

Unlike web pages, mobile apps do not have links. They do not have web addresses. They live in worlds by themselves, largely cut off from one another and the broader Internet. And so it is much harder to share the information found on them.

Yes, yes, for the love of God yes.

We have broken HTTP.  We’ve done it for years in fits and starts, but apps have completely broken it.  HTTP was a good specification which we’ve steadily whittled away.

URLs have a purpose.  We are very cavalier about that purpose. We don’t use canonicals. We’re sloppy about switching back and forth between HTTP and HTTPs.  We don’t bother to logically structure our URLs.  We rebuild websites and let all the links break. We don’t appreciate that crawlers are dumb and they need more context than humans.

Did you know there’s something called a URN – Uniform Resource Name?  This was supposed to be one level above a URL.  Your resource would have a URN, which would be a global identifier, and it would resolve to a URL which was just where the resource was located right now.  URNs never caught on, but they web would be better if they had.  Content could then have a “name” which was matched to it forever, regardless of its current URL.  (The “guid” element in RSS probably should have been named “urn,” in fact.)

And it’s not just URLs.  HTTP status codes exist for a reason too.  Did you know that there are a lot of them?  In fact, there’s one for about everything that could happen for a web request.  Did you know there’s a difference between 404 and 410?  404 (traditionally “Not Found”) means it was never here.  410 (traditionally “Gone”) means it was once here but is now gone.  Big difference.

Ever hear of 303 and 307?  They’re meant for load redirects (mirrors).  The human readable descriptions are usually “See Other” or “Temporary Redirect.”  Did you know there was a “402 Payment Required”?  There’s a bunch that were just never implemented. These days a lot of websites just return “200 OK” for everything, even 404s, which drives me freaking nuts.  (And, yes, I’m sure I’ve done it, so don’t go looking too hard through my portfolio…)

(A new company called Words API (it’s an API…for words) made me jump for joy when I saw they are using actual, intelligent HTTP status codes on their responses, even their errors.  If you go over your usage limit, for example, you get a “429 Too Many Requests ” back. Good for them.)

Do you know why your FORM tag has an attribute called “method”?  Because you’re calling a method on a web server, like a method on an object in OOP.  Did you know there are other methods besides GET and POST?  There’s HEAD and OPTIONS and PUT and DELETE.  And you can write your own.  So if you’re passing data back and forth between your app/site and your web server, you’re welcome to name custom methods in the leading line of the header.

And, technically, you’re supposed to make sure GET requests are idempotent, meaning they can be repeated with no changes to a resource.  So you should be able to hit Refresh all day on a GET request without causing any data change (beyond perhaps analytics).  If you’re changing data on a server, that should always be a POST request (or PUT or DELETE, if anyone ever used them as intended).

I could go on and on.  Don’t even get me started about URL parameters. No, not querystrings – there was originally a specification where you could do something like “/key1:value1/key2:value2/” to pass data into a request. And what about the series of “UA-*” headers that existed to tell the web server information about the rendering capabilities of the user agent?  (And dare I wander off into metadata-related ranting…two words people, Dublin Core!)

My point is that a lot of web developers today are completely ignorant of the protocol that is the basis for their job.  A core understanding of HTTP should be a base requirement for working in this business.  To not do that is to ignore a massive part of digital history (which we’re also very good at).

I’m currently working through HTTP: The Definitive Guide by O’Reilly.  The book was written in 2002, but HTTP hasn’t changed much since then.  It’s fascinating to read all the features built into HTTP that no one uses because they were never adopted or no one bothered to do some research before they re-solved a problem. There’s a lot of stuff in there that solves problems we’ve since programmed our way around.  The designers of the spec were probably smarter than you, it turns out.

(HTTP/2 is currently proposed, but it doesn’t change much of the high level stuff.  The changes are mostly low-level data transport hacks, based on Google’s experience with SPDY.)

At risk of sounding like a crabby old man (I’m 43 and have been developing for the web since 1996), this is one small symptom of a larger problem – developers tend to think they can solve every problem, and they’re pretty sure that nothing good happened before they arrived on the scene. Anyone working in this space 20 years ago couldn’t possibly have known of their problems so every problem deserves a new solution.

Developers often don’t know what they don’t know (that link goes to my personal confession of this exact thing), and they feel no need to study the history of their technology to gain some context about it.  Hell, we all need to sit and read The Innovators together.

Narcissism runs rampant in this industry, and our willingness to throw away and ignore some of the core philosophies of HTTP is just one manifestation of this.  Rant over.


America’s CTO

By on January 5, 2015

Adviser Guides Obama Into the Google Age: A profile of the new U.S. CTO Megan Smith. The transition is rocky:

Not only does she now carry a BlackBerry, she uses a 2013 Dell laptop: new by government standards, but clunky enough compared with the cutting-edge devices of her former life that her young son asked what it was.

Additionally, the position is suspected of being nothing but a figurehead.

The problem, technology experts say, is that the mandate of the chief technology officer has been nebulous since Mr. Obama created the job five years ago, not least because it does not come with a substantial funding stream, a crucial source of power in the government.

No money, no power.


Qualcomm: The Monopoly You’ve Never Heard Of?

By on January 5, 2015

The title of this post has nothing to do with this point, which I found interesting:

[…] no one seems to be paying attention to Qualcomm’s incredible chipset dominance in mobile. Android and Snapdragon look an awful lot like Windows and Intel; every hardware maker except for Apple is beholden to the two giants behind the platform. (And even Apple uses Qualcomm’s LTE radios and other chips in the iPhone.) Qualcomm has an incredible patent moat and it seems to be pushing Snapdragon along right on schedule, but that’s exactly where Intel was before the smartphone explosion sent its roadmap spinning wildly off course. The coming explosion of internet of things devices, wearables, and other sensor-laden gadgets is a huge opportunity for every company that failed in mobile, and a dangerous moment for Qualcomm.


Crypto-Hacking Case Study

By on January 4, 2015

How My Mom Got Hacked: Interesting look at a case of crypto-hacking. Turns out that actually paying the ransom was the difficult part.

By the time my mom called to ask for my help, it was already Day 6 and the clock was ticking. (Literally — the virus comes with a countdown clock, ratcheting up the pressure to pay.) My father had already spent all week trying to convince her that losing six months of files wasn’t the end of the world (she had last backed up her computer in May). It was pointless to argue with her. She had thought through all of her options; she wanted to pay.


It’s a Young Developer’s World

By on December 24, 2014

Silicon Valley’s Youth Problem: Silicon Valley and start-up culture is dysfunctional.  So many great quotes in this piece.

[…] what matters most is not salary, or stability, or job security, but cool. Cool exists at the ineffable confluence of smart people, big money and compelling product. You can buy it, but only up to a point. For example, Microsoft, while perpetually cast as an industry dinosaur, is in fact in very good financial shape. Starting salaries are competitive with those at Google and Facebook; top talent is promoted rapidly. Last year, every Microsoft intern was given a Surface tablet; in July, they were flown to Seattle for an all-expenses-paid week whose activities included a concert headlined by Macklemore and deadmau5.  Despite these efforts, Microsoft’s cool feels coerced.

As a guy pushing 44-years-old, this resonates:

If you are 50, no matter how good your coding skills, you probably do not want to be called a “ninja” and go on bar crawls every weekend with your colleagues, which is exactly what many of my friends do.

And what are we losing for the world when the top technical talent wants to work at companies that do – let’s face it – stupid, meaningless things?

Why do these smart, quantitatively trained engineers, who could help cure cancer or fix, want to work for a sexting app?

[sigh] I’m old.


Bitcoin for the Befuddled

By on December 7, 2014

bitcoin-for-the-befuddled-coverI got a review copy of “Bitcoin for the Befuddled” from No Starch Press (a publisher I’ve really enjoyed over the years).  The title is an accurate description of where I’m at on Bitcoin – I have a basic understanding of it, but the intricacies are escaping me.

Sadly, Duning-Kruger being what it is, reading the book left me more confused, but most likely in a good way – at least now I can grasp the scope and depth of what I don’t understand.  I’m in a better position to figure things out after reading the book, even if I don’t have all the answers now. (You can’t Google “blockchain” without knowing that the concept exists and is “Google-able”…)

Here’s a random sample of what I learned (in an attempt to cement my own knowledge, if nothing else):

  • Bitcoin is a weird thing. Creating it was almost a magic discovery of some intersection between cryptography, economics, and game theory. The entire thing seems both precarious and stable, like a perfectly interlocking house of cards. It’s like a Mexican Standoff that works to everyone’s benefit.
  • The guy (girl?) who “discovered” it did so in a 2008 paper anonymously published to a cryptography mailing list. He is known as “Satoshi,” but no one knows who he really is. He hasn’t been heard from in years. (Sounds like a movie plot, I know.)  He claimed to be a 37-year-old man from Japan, but many people don’t believe that.
  • The core logical basis of Bitcoin is that the ledger – the entire history of transactions – is public and everyone has it. So everyone can recreate the entire history of Bitcoin transactions, and everyone confirms that it’s valid every 10 minutes. With this process in place, no one can cheat the system because the entire history of the currency is in the open.
  • This ledger is known as “the blockchain,” which is a – wait for it – chain of blocks, which are packets of information . Every block has a hashcode from the block before it.  Think about that, for a second – if each block verifies the one before it, then you can trace the validity of the chain backwards to the very first block/transaction (the “genesis block”).  The validation of the last block in the chain (they generate every 10 minutes or so), effectively validates the entire chain. That is just elegantly beautiful.
  • There’s an astonishing amount of cryptography involved. Without crazy math, Bitcoin wouldn’t exist.
  • There’s almost an equal amount of game theory involved.  There are weaknesses in the system that are covered by other strengths which make pursuing the weaknesses non-profitable and therefore pointless.
  • Storing Bitcoins can be a potentially complicated thing, depending on how much you have to store and how secure you want it. It can involve offline computers, “hot” or “cold” wallets, and even…
  • …paper. Bitcoins can be stored on paper. So long as you can codify a cryptographic key value as a QR code, then there’s nothing stopping you from printing out $1 billion in Bitcoins, throwing away all digital record of it, and turning your filing cabinet into the most valuable piece of furniture in history.
I’m still fuzzy on some other things:
  • How big does the blockchain get before its unwieldy?  The chain is something like 20GB now, which means a full copy of it might take days to download to populate your wallet.  This will only get worse as Bitcoin gets more and more popular. (I did some research – people have put a lot of thought into this.)
  • Doesn’t the system require Internet access?  This point seems obvious, but I’m wondering how much of limitation this would be with wider adoption.
  • How do nodes on the Bitcoin network connect? This is glossed over – nodes just seem to magically find each other.  I assume they connect over some standard port/protocol, but then wouldn’t that be ripe for DOS attack? I know the book wasn’t intended as a networking reference, but at least some information on that would have helped visualize it my head.
Overall, the book is well-done.  It does a nice job of gauging where you might be at, and attacking the problem from multiple sides.  Bitcoin is a frustratingly slippery thing – you think you have it figured out, then the zen of it falls out of your head for a second and you have to fight to get it back. I’m sure there a people for whom this is all crystal clear, but I am not one of them.

The book has narrative sections, and interestingly, a full-length comic book right in the middle of it. There’s a chapter about the cryptographic basis of Bitcoin that had a lot  of math and graphs. I admit to skimming that one a bit.  There’s also perhaps an over-abundance of analogies. You start confusing them for each other after a while.

(Also, weirdly, the book is full of typographical errors.  I found three of them in a two-paragraph stretch, at one point.)

All in all, the book fulfilled its promise.  I was befuddled.  I’m still a little confused, but I’m light years ahead of where I was. Let’s call this book a primer – it gets you started, and gives the basic knowledge required to learn more, if that’s what you decide to do. Honestly, I feel like I probably know enough at this point to trust Bitcoin and perhaps become a user of it.  If I ever decided to mine it (something the book highly discourages) or develop against it, I would clearly need to know more.

But, for now, this is enough.

(If you want to see the first two chapters, which is where some of the core theory lies, the page at No Starch has a free download.)


Why Nigerian 419 Scam Emails Suck

By on December 2, 2014

You know how when you get a Nigerian scam email, and you read it, and you’re like “who the hell falls for these?”

I wondered, so I tweeted that I’d like to see a good content strategist re-write on of these emails to be super effective.  Someone responded and pointed me to this study from Microsoft: Why do Nigerian Scammers Say They are from Nigeria?.

There’s a lot of math and stats in there, but here’s the gist – they’re designed to suck, so that they only attract the most gullible people.

By sending an email that repels all but the most  gullible the scammer gets the most promising marks to self-select, and tilts the true to false positive ratio in his favor.

If you’re still on-board after reading a pitch that bad, then there’s a good chance you’ll stay on-board all throughout the long process to separate you from your money.  People who can instantly sniff out a scam are low-payoff targets, and waste the scammer’s time.


The “Import and Update” Pattern

By on November 12, 2014

Most all CMS support content import, to some extent. There’s always an API and often a web service for you to fire content into a system from the outside.

But a model we see over and over that really needs to be explicitly acknowledged is that of “import and update.” This means, create new content if it doesn’t exist, but update the content in-place if it was previously created. It’s used to support instances when we’re syncing information stored inside the CMS with information stored outside the CMS.

For example, let’s say our hospital maintains its physician profiles in a separate database (for whatever reason). However, we need our physicians to have managed content objects inside the CMS, for a variety of reasons (for a list of why this is handy, see my post on proxy objects in CMS).

We can easily write a job to import our physician profiles, but what happens when they update in the source database? We don’t want to import again, we just want to update the page inside the CMS. Sure, we could delete it and recreate it, but that becomes problematic when it might change the URL, or increment a set of ID numbers, or even delete information in the CMS which is referencing that specific content object (analytics, for example).

EPiServer has a “Content Channel” architecture that handles this.  You fire a dictionary of key-value pairs (representing content properties and their values) at a web service.  You can optionally include the GUID of an existing content objects.  No GUID means EPiServer will create a new object, while data coming in with a GUID will find the corresponding page and update it with the incoming information. It essentially keeps the content object shell, but overwrites all the information in it.

With any system like this, you need to maintain a mapping between the ID outside the CMS, and the ID inside the CMS.  You need to know that Database Record #654 is updating Content ID #492. When iterating your database rows, when you run across ID #654, you know to reference ID #492 when talking to the CMS. You also need to be able to get the newly-created back out of the CMS when content is created, so you can create a mapping for it – if my CMS creates Content ID #732, I need to know this so I can reference it later.

Some CMS offer “content provider” models, which are real-time methods to “mount” other repositories.  So, instead of importing and updating this data, the CMS reaches out to our external database in real-time when required to get objects back and mock them up as first-order content objects.

This is certainly elegant and sophisticated, but it presents problems with performance, uptime of the source system, unnecessary computational overhead if the content doesn’t change much, network topology and unbroken connectivity, and the inability to extend the content with new data inside the CMS (for instance, while 90% of the information about our physicians comes from the external database, perhaps we have a couple of properties that live inside the CMS only).

I hope I see this pattern more often. EPiServer has it, eZ publish has it, and I’m sure many others. Additionally, it’s not hard to build it. If you can put together a web service, you should be able to pull it off.

It’s a handy thing to have.


Metadata Depends on Perspective

By on November 12, 2014

I’m reading The Discipline of Organizing. Early in the book, the author talks about “metadata,” which is a topic I’ve complained about before (go read those; I’ll wait). When it comes to web content management, I think it’s hard to differentiate between the “first order data” and the “metadata.” Which is which?

The author calls it ever further into question by introducing the perspective of the observer.

[…] what serves as metadata for one person or process can function as a primary resource or data for another one. Rather than being an inherent distinction, the difference between primary and associated resources is often just a decision about which resource we are focusing on in some situation. An animal specimen in a natural history museum might be a primary resource for museum visitors and scientists interested in anatomy, but information about where the specimen was collection is the primary resource for scientists interested in ecology or migration.