Bitcoin for the Befuddled

By on December 7, 2014

bitcoin-for-the-befuddled-coverI got a review copy of “Bitcoin for the Befuddled” from No Starch Press (a publisher I’ve really enjoyed over the years).  The title is an accurate description of where I’m at on Bitcoin – I have a basic understanding of it, but the intricacies are escaping me.

Sadly, Duning-Kruger being what it is, reading the book left me more confused, but most likely in a good way – at least now I can grasp the scope and depth of what I don’t understand.  I’m in a better position to figure things out after reading the book, even if I don’t have all the answers now. (You can’t Google “blockchain” without knowing that the concept exists and is “Google-able”…)

Here’s a random sample of what I learned (in an attempt to cement my own knowledge, if nothing else):

  • Bitcoin is a weird thing. Creating it was almost a magic discovery of some intersection between cryptography, economics, and game theory. The entire thing seems both precarious and stable, like a perfectly interlocking house of cards. It’s like a Mexican Standoff that works to everyone’s benefit.
  • The guy (girl?) who “discovered” it did so in a 2008 paper anonymously published to a cryptography mailing list. He is known as “Satoshi,” but no one knows who he really is. He hasn’t been heard from in years. (Sounds like a movie plot, I know.)  He claimed to be a 37-year-old man from Japan, but many people don’t believe that.
  • The core logical basis of Bitcoin is that the ledger – the entire history of transactions – is public and everyone has it. So everyone can recreate the entire history of Bitcoin transactions, and everyone confirms that it’s valid every 10 minutes. With this process in place, no one can cheat the system because the entire history of the currency is in the open.
  • This ledger is known as “the blockchain,” which is a – wait for it – chain of blocks, which are packets of information . Every block has a hashcode from the block before it.  Think about that, for a second – if each block verifies the one before it, then you can trace the validity of the chain backwards to the very first block/transaction (the “genesis block”).  The validation of the last block in the chain (they generate every 10 minutes or so), effectively validates the entire chain. That is just elegantly beautiful.
  • There’s an astonishing amount of cryptography involved. Without crazy math, Bitcoin wouldn’t exist.
  • There’s almost an equal amount of game theory involved.  There are weaknesses in the system that are covered by other strengths which make pursuing the weaknesses non-profitable and therefore pointless.
  • Storing Bitcoins can be a potentially complicated thing, depending on how much you have to store and how secure you want it. It can involve offline computers, “hot” or “cold” wallets, and even…
  • …paper. Bitcoins can be stored on paper. So long as you can codify a cryptographic key value as a QR code, then there’s nothing stopping you from printing out $1 billion in Bitcoins, throwing away all digital record of it, and turning your filing cabinet into the most valuable piece of furniture in history.
I’m still fuzzy on some other things:
  • How big does the blockchain get before its unwieldy?  The chain is something like 20GB now, which means a full copy of it might take days to download to populate your wallet.  This will only get worse as Bitcoin gets more and more popular. (I did some research – people have put a lot of thought into this.)
  • Doesn’t the system require Internet access?  This point seems obvious, but I’m wondering how much of limitation this would be with wider adoption.
  • How do nodes on the Bitcoin network connect? This is glossed over – nodes just seem to magically find each other.  I assume they connect over some standard port/protocol, but then wouldn’t that be ripe for DOS attack? I know the book wasn’t intended as a networking reference, but at least some information on that would have helped visualize it my head.
Overall, the book is well-done.  It does a nice job of gauging where you might be at, and attacking the problem from multiple sides.  Bitcoin is a frustratingly slippery thing – you think you have it figured out, then the zen of it falls out of your head for a second and you have to fight to get it back. I’m sure there a people for whom this is all crystal clear, but I am not one of them.

The book has narrative sections, and interestingly, a full-length comic book right in the middle of it. There’s a chapter about the cryptographic basis of Bitcoin that had a lot  of math and graphs. I admit to skimming that one a bit.  There’s also perhaps an over-abundance of analogies. You start confusing them for each other after a while.

(Also, weirdly, the book is full of typographical errors.  I found three of them in a two-paragraph stretch, at one point.)

All in all, the book fulfilled its promise.  I was befuddled.  I’m still a little confused, but I’m light years ahead of where I was. Let’s call this book a primer – it gets you started, and gives the basic knowledge required to learn more, if that’s what you decide to do. Honestly, I feel like I probably know enough at this point to trust Bitcoin and perhaps become a user of it.  If I ever decided to mine it (something the book highly discourages) or develop against it, I would clearly need to know more.

But, for now, this is enough.

(If you want to see the first two chapters, which is where some of the core theory lies, the page at No Starch has a free download.)


Why Nigerian 419 Scam Emails Suck

By on December 2, 2014

You know how when you get a Nigerian scam email, and you read it, and you’re like “who the hell falls for these?”

I wondered, so I tweeted that I’d like to see a good content strategist re-write on of these emails to be super effective.  Someone responded and pointed me to this study from Microsoft: Why do Nigerian Scammers Say They are from Nigeria?.

There’s a lot of math and stats in there, but here’s the gist – they’re designed to suck, so that they only attract the most gullible people.

By sending an email that repels all but the most  gullible the scammer gets the most promising marks to self-select, and tilts the true to false positive ratio in his favor.

If you’re still on-board after reading a pitch that bad, then there’s a good chance you’ll stay on-board all throughout the long process to separate you from your money.  People who can instantly sniff out a scam are low-payoff targets, and waste the scammer’s time.


The “Import and Update” Pattern

By on November 12, 2014

Most all CMS support content import, to some extent. There’s always an API and often a web service for you to fire content into a system from the outside.

But a model we see over and over that really needs to be explicitly acknowledged is that of “import and update.” This means, create new content if it doesn’t exist, but update the content in-place if it was previously created. It’s used to support instances when we’re syncing information stored inside the CMS with information stored outside the CMS.

For example, let’s say our hospital maintains its physician profiles in a separate database (for whatever reason). However, we need our physicians to have managed content objects inside the CMS, for a variety of reasons (for a list of why this is handy, see my post on proxy objects in CMS).

We can easily write a job to import our physician profiles, but what happens when they update in the source database? We don’t want to import again, we just want to update the page inside the CMS. Sure, we could delete it and recreate it, but that becomes problematic when it might change the URL, or increment a set of ID numbers, or even delete information in the CMS which is referencing that specific content object (analytics, for example).

EPiServer has a “Content Channel” architecture that handles this.  You fire a dictionary of key-value pairs (representing content properties and their values) at a web service.  You can optionally include the GUID of an existing content objects.  No GUID means EPiServer will create a new object, while data coming in with a GUID will find the corresponding page and update it with the incoming information. It essentially keeps the content object shell, but overwrites all the information in it.

With any system like this, you need to maintain a mapping between the ID outside the CMS, and the ID inside the CMS.  You need to know that Database Record #654 is updating Content ID #492. When iterating your database rows, when you run across ID #654, you know to reference ID #492 when talking to the CMS. You also need to be able to get the newly-created back out of the CMS when content is created, so you can create a mapping for it – if my CMS creates Content ID #732, I need to know this so I can reference it later.

Some CMS offer “content provider” models, which are real-time methods to “mount” other repositories.  So, instead of importing and updating this data, the CMS reaches out to our external database in real-time when required to get objects back and mock them up as first-order content objects.

This is certainly elegant and sophisticated, but it presents problems with performance, uptime of the source system, unnecessary computational overhead if the content doesn’t change much, network topology and unbroken connectivity, and the inability to extend the content with new data inside the CMS (for instance, while 90% of the information about our physicians comes from the external database, perhaps we have a couple of properties that live inside the CMS only).

I hope I see this pattern more often. EPiServer has it, eZ publish has it, and I’m sure many others. Additionally, it’s not hard to build it. If you can put together a web service, you should be able to pull it off.

It’s a handy thing to have.


Metadata Depends on Perspective

By on November 12, 2014

I’m reading The Discipline of Organizing. Early in the book, the author talks about “metadata,” which is a topic I’ve complained about before (go read those; I’ll wait). When it comes to web content management, I think it’s hard to differentiate between the “first order data” and the “metadata.” Which is which?

The author calls it ever further into question by introducing the perspective of the observer.

[…] what serves as metadata for one person or process can function as a primary resource or data for another one. Rather than being an inherent distinction, the difference between primary and associated resources is often just a decision about which resource we are focusing on in some situation. An animal specimen in a natural history museum might be a primary resource for museum visitors and scientists interested in anatomy, but information about where the specimen was collection is the primary resource for scientists interested in ecology or migration.


Things that Web Crawlers Hate

By on November 12, 2014

I wrote a web crawler in C# a couple years ago. I’ve been fiddling with it ever since.  During that time, I’ve have been forcibly introduced to the following list of things my crawler hates.

  1. Websites that return a 200 OK for everything, even if it was a 404 or a 500 or a 302 or whatever
  2. Websites that don’t use canonical URL tags
  3. Websites with self-replicating URL rabbit holes
  4. Websites that don’t use the Google Sitemap protocol (no, I don’t depend on it, but it’s awfully handy to seed the crawler with starting points – I promise that a crawl will be better with one than without one)
  5. Websites that have non-critical information carried into the page on querystring params, thus giving multiple URLs to the same content
  6. Websites with SSL that let don’t control their schemes – only allow secured pages under HTTPS, and vice-versa – so that you can’t have two URLs for the same content, just with different schemes
  7. Websites with a “print” option on every single page with a querystring param, thus giving that page two different URLs (okay, okay, this one is easy to filter for – I just always forget…)
  8. Misuse of the content-type HTTP header, because file extensions will handle it all…

Admittedly, a lot of things in this list are why crawlers are hard to write, and I should just suck it up and deal with it because this is reality. But the entire process has underscored to me how loosely we treat URLs (see the canonical URL post linked above for more on this).

We’re generally very cavalier about our URLs, and I think the web as a whole is worse off for it. URLs are a core technology, and there’s a philosophical point behind them dealing with the universal access to information. findability, and indexability.

We should be more careful. Rant over.


Is Fareed Zakaria editing his own Wikipedia page?

By on November 11, 2014

Fareed Zakaria is Apparently Editing His Own Wikipedia to Remove Plagiarism Allegations:  CNN contributor Fareed Zakaria has been accused of plagiarism.

Our Bad Media has noted several edits to his Wikipedia page which they suspect are coming from Zakaria himself. The edits are coming from New York City where Zakaria lives, they remove a lot of the plagiarism accusations, and they do a couple other things which are curious:

The account’s second edit, made the same day as the first, strengthened his bio by noting he was not just an author, but an author of THREE BOOKS. […] Finally – and most tellingly – the editor did what only a good son would: fix the name of Zakaria’s mother, from “Fatima” to “Fatma.”

Is this proof that Zakaria is editing his own Wikipedia page?  Not conclusive, certainly, but it sure is interesting.


Startup Depression

By on November 11, 2014

Startup Without Depression: A site dedicated to combatting depression in the startup world.

Depression in the startup community can be an unfortunate byproduct of the stresses of creating something from nothing. For each individual that finds the strength to speak or write publicly of their struggles, many more grapple silently with their own demons. Below is a small collection of resources that offer professional help for those battling depression and related illnesses, as well as a sampling of writing by individuals in tech willing to share their struggles.


Do Hyperlinks Change the Meaning of Content?

By on November 7, 2014

I’ve been thinking deeply about the idea of hypertext lately (reading Vannevar Bush didn’t help), and I’m curious if there’s a standard, convention, or best practice for the actual selection of words to link in a sentence? Additionally, to what extent does the existence of a link and the placement of that link affect the perceived meaning of the underlying text?

Historically, we’ve all hyperlinked the infamous “click here” phrase, and accepted that this is doesn’t make sense without the link.  But is this effect even more subtle?

Consider, in fact, hyperlink in the parenthetical aside above from the first sentence in this post.  There are four ways, I think, to link this:

I think each one of those changes the sentence, subtly — the existence of the link and its positioning has an actual effect on how the sentence is perceived.

Is the important point of this sentence that…

  1. I read something (as opposed to doing something else with it)
  2. I read Vannevar Bush in particular (as opposed to reading someone else)
  3. It “didn’t help” (as opposed to having some other effect — the “didn’t help” is sarcastic)
  4. The combination of all three
So, the link itself becomes part of the content. Whether it wants to or not, where the link is situated changes the meaning of the words.

Does the hyperlink change the emphasis of the sentence, if you were to read it out loud?  Would you mentally incorporate the hyperlink into your verbal presentation of the sentence?

(After I posted this, Arild Henrichsen made a tweet referencing Chandler Bing from Friends and his tendency to emphasize the word “be.” Funny as this is, the point is valid — Chandler aptly demonstrates how you might mentally read a sentence where the word “be” is hyperlinked).

More importantly, if the link was gone, would the sentence even make sense on its own?  That sentence depends on its link target to impart meaning.  If there was nothing to click on, the sentence would be some random non-sequiter with no context (unless, of course, you had read the Vannevar Bush post relatively recently, and were independently able to connect the two).  With the link, the reader can click through and understand exactly what I’m talking about.

But even if they never follow the link, the fact that it’s there makes them think there’s some explanation to a sentence which is otherwise random — they are aware that this requires explanation. They can choose to seek out this explanation if they want, or else they can just acknowledge that there is an explanation, and decide that they don’t care. But the hyperlink signals that further information about a given word or phrase exists, which is helpful– if someone is making an inside joke and you know this, it’s much less confusing.

Links provide context. Their existence and positioning impart and affect meaning.


“As We May Think”

By on November 7, 2014

I’ve become quite interested in Internet history lately, and I’ve run across Vannevar Bush‘s name multiple times. He was a American scientist, quite active during Word War II, and is historically known for expounding on an idea he had for a device called the “memex,” which was, in some ways, a precursor to the web itself.  (Tim Berners-Lee, in fact, has cited Bush’s work as foundational to his own work.)

Bush was vexed by the the difficulty in recording knowledge and — more importantly — recalling it, in the 1940s.  The idea of massive bound volumes frustrated him, because he was convinced that the human mind just didn’t work that way. He expounded on this in a famous 1945 essay published in The Atlantic entitled As We May Think:

Our ineptitude in getting at the record is largely caused by the artificiality of systems of indexing. When data of any sort are placed in storage, they are filed alphabetically or numerically, and information is found (when it is) by tracing it down from subclass to subclass. It can be in only one place, unless duplicates are used; one has to have rules as to which path will locate it, and the rules are cumbersome. Having found one item, moreover, one has to emerge from the system and re-enter on a new path.
Linear storage was a problem, not a solution. Bush wanted to store information the way the human mind worked:
The human mind does not work that way. It operates by association. With one item in its grasp, it snaps instantly to the next that is suggested by the association of thoughts, in accordance with some intricate web of trails carried by the cells of the brain.
To this end, he elaborated on his idea of the memex:
A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory. [...] It consists of a desk, and while it can presumably be operated from a distance, it is primarily the piece of furniture at which he works. [...] All this is conventional, except for the projection forward of present-day mechanisms and gadgetry. It affords an immediate step, however, to associative indexing, the basic idea of which is a provision whereby any item may be caused at will to select immediately and automatically another.
That last bit is essentially the basis of hypertext.

The entire essay is worth reading. It’s so celebrated in Internet history, in fact, that a symposium was held in 1995 in honor of its 50th anniversary. In 2005, the 65th anniversary was celebrated with a panel discussion at the ACM Hypertext and Hypermedia conference (video here).

A lot of the first half is Bush discussing various photographic technologies and their possibilities for recording knowledge (he gets very close to inventing Google Glass at one point).  The bit about human thought processes and the memex comes at the very end.



Racism on Reddit

By on October 28, 2014

Hate Speech Is Drowning Reddit and No One Can Stop It: I was vaguely aware of this, but I don’t frequent many of the subs where this comes to light.

Reddit has a hate speech problem, but more than that, Reddit has a Reddit problem. A persistent, organized and particularly hateful strain of racism has emerged on the site. Enabled by Reddit’s system and permitted thanks to its fervent stance against any censorship, it has proven capable of overwhelming the site’s volunteer moderators and rendering entire subreddits unusable.

More and more, I think Reddit’s best days are behind it. The site has seemingly devolved into one big inside joke.  Stephen Colbert said much the same thing on the first episode of Slate’s new podcast, Working:

I read Reddit in the morning [pause] …which is not as useful as it used to be. I used to feel that it was more stories and less memes, photographic memes. Now it’s just been sort of consumed by Imgur photographic memes.