Gadgetopia

May 16

SixthSense Input

The Thrilling Potential of SixthSense Technology: If you watch one Ted video, I’d recommend this one.  This is Pranav Mistry demonstrating SixthSense, which is a touch and gesture based input device.  It gets pretty remarkable as the video goes on – this guy has completely rethought input devices, and the result is amazing.


May 12

Use Canonical URLs, Please

If you want to be friendly to the web, do the world a favor and start using the canonical URL LINK tag.  More things than you realize depend on the simple principle of identifying a page by a unique URL, and it’s getting harder than you think.

It turns out that a web without canonical URLs is like a database without primary keys.

I recently wrote a web crawler (yes, seriously), and I was forcefully introduced to how vague URLs can be.  The idea that a unique page of content has a single URL is laughably naïve.

Turns out that the absolute hardest part of writing a crawler is “normalizing” URLs – taking two URLs, and trying to figure out if they’re actually addressing the same resource.  Fact is, you can address a page of content in more ways than you think.

Some examples —

You need to account for SSL vs. non-SSL.  A lot of sites will accept inbound requests for both “http” and “https” to the same URL, and return the same page of content.  This technically results in two separate URLs, and if a crawler is cataloguing URLs, it needs to account for the same that this is really the same page of content, even if the bytes of the URL differ.

Now, that one isn’t too hard.  There aren’t many pages that differ remarkably if they’re secure or not.  But what about domain?  Your website could respond to multiple domains.  It could be as simple as the same content coming up under “www.gadgetopia.com” and “gadgetopia.com”, or as complex as hundreds of different domain names generating the same pages.

It gets worse – what about querystring arguments?  The fact is that different arguments have different degrees of import.  Some are critical in determining the content of the page (“article_id”) and others really only matter to humans interacting with the page (“return_page”).  There’s a whole bucket of querystring arguments that really have no effect on the core content being returned to the user agent.

(URL arguments for analytics are especially bad.  Click a link out of a Feedburned blog post, and you end up with “utm_source” and “utm_medium” as querystring arguments, none of which have any bearing on the actual content of the page returned.)

Differing capitalization could technically result in different pages too (although this would be terribly bad form…)

I could go on and on about URL vagaries, but just understand that this URL —

https://domain.com/page.php?article_id=5&return_page=6

— and this URL —

http://www.domain.com/Page.php?article_id=5&return_page=7

may return the exact same page of content, but I have no way of knowing this.

On a known site (a site I own or am crawling for a client), I can make some rules, like always knowing that I should swap “domain.com” for “www.domain.com,” but if I’m doing a crawl of a site I have no connection with (a “hostile” crawl?), then I just have to assume those two URLs are actually two separate piece of content and index them as different pages even though “article_id=5” probably indicates they return the same thing.

And none of this takes into account the new world of visitor segmentation and anonymous personalization.  If you live in California, you might get a different page then if you live in New York.  So where is your crawler coming from, and how is it ever going to emulate someone from somewhere else?

(For a while, I tried to abandon URLs and hash the actual HTML returned, then compare the hashes.  This would tell me, more clearly, if this page is unique.  But that too is problematic for a number of reasons – sometimes querystring arguments, for instance, change the page in tiny, effectively meaningless ways, but ways which result in an entirely different hash.)

This is where canonical URLs help.  For each page content, have a canonical META tag which indicates the one true URL this should be accessed under anonymously.  Here’s Google’s page about them, and here’s what one of them looks like.

<link rel="canonical" href="http://example.com/my/url" />

It’s not just crawlers that depend on this — any site which needs to tell one URL from another would benefit from this.  If you submit a URL to Reddit, it checks to see if it’s been submitted already.  To do this, it depends on the fact that the URL has some consistency.

If you are writing software that somehow keys of a URL, look for a canonical LINK tag and use it if you find it.  By including it, the site owner is doing you a massive favor.  Don’t ignore it.

Using a canonical URL is like declaring a primary key on your content.  You are saying, effectively, that “no matter how you actually got to this page of content, this URL is the official URL for this page and should be used when discussing this page.”

The web will be a better place for it.


May 12

What does “published” mean anymore?

“Self-published” is not in any way analogous to “published”: There’s a very interesting discussion going on over at Reddit that’s very similar to something a line of thought I’ve had for quite a while.

The Internet has made it very easy to “publish” writing, in some form.  Pre-Internet, to get “published” meant to send a book off to a publisher, go through a long vetting process, and see your book on the shelves of a store somewhere.  Not anymore.  With Lulu, you can get a hardcopy from a PDF, and with Amazon, you can have your book distributed as an ebook quite easily (it doesn’t even have to be a book – you can “publish” a glorified blog post as a Kindle Single, even).

So, does this mean you’re “published”?

Nononno!  Being published is not just about having a hard copy of one’s work - that misses the forest for the trees. Being published is about convincing a third party that your work is worthwhile enough to support and make public. It’s about earning the respect of a group completely independent of you and having them fund the dissemination of your ideas.

Some of the comments are quite good and thought-provoking:

I was just having a discussion on this yesterday in my library studies class. A lady in the class kept referring to herself as a ‘published author’ and when I investigated further I found that all she does is chuck her romance novels up on her website as eBooks.

[…] I came here to say something like this, specifically about the wrench that sites like kickstarter throw into the works. As the original post states, “Being published is about convincing a third party that your work is worthwhile enough to support and make public” but traditional publishing houses are no longer the only viable way of doing that.


Apr 30

The Facebook IPO Effect on Real Estate

Silicon Valley real estate: The Facebook effect: The pending Facebook IPO is affecting Silicon Valley real estate.  Sellers are keeping their homes off the market until the IPO is done and there’s a new batch of buyers, flush with cash.

Though the number of actual prospective home buyers with Facebook connections is only a fraction of all buyers in the Valley, their psychological effect on the market is unmistakable. In Palo Alto, in particular — which Mark Zuckerberg calls home —sellers are either keeping their homes off the market until the IPO or ramping up expectations. For the first quarter of 2012, according to BrokerMetrics, the median price of a single-family P.A. home went up 11%, whereas inventory declined 57%


Apr 29

The Rise of the Brogrammer

“Gangbang Interviews” and “Bikini Shots”: Silicon Valley’s Brogrammer Problem: Here’s an article about the apparent growing sexism in the programmer business.

Remember, a few years ago, we discussed the guy who used Playboy’s CyberGirls in his presentation, and then there was the guy who used a running porn metaphor to describe CouchDB.

This article starts off with a similar story about a presenter at SXSW:

He said he’d won over Digg’s elusive cofounders by sending them “bikini shots” from a “nudie calendar” he’d put together with photographs of fellow students posing in their swimsuits.

Van Horn continued with some tips for hiring managers: He cautioned against “gangbang interviews”—screening prospective employees by committee—and made a crack about his fraternity’s recruiting strategy, designed to “attract the hottest girls” on campus. He seemed taken aback when nobody laughed. “C’mon, guys, we all know how it was in college,” he muttered.

The article then launches into a long discussion, complete with too many examples to follow, about how women are marginalized in the programming trade, and how this has led to the rise of the “brogrammer” – a geek who revels in the male-centric culture to the point of being blatantly sexist.

It’s an interesting read.


Apr 28

The Virtues of eBooks

Books: Bits vs. Atoms: I’m trying very hard to get over my attachment to physical books.  Jeff Atwood nudges me further in that direction.

At the risk of stating the obvious, if your goal is to get a written idea in front of as many human beings as efficiently as possible, you shouldn’t be publishing dead tree books at all. You should be editing a wiki, writing a blog, or creating a website.

For some reason, I’m addicted to physical books.  I previously discussed my experience with an original Kindle, which wasn’t great.

There are two things that ebooks still just don’t do for me.

  1. They don’t provide some physical reminder of their presence.
    I love being surrounded by books.  I seem them, and I think about them.  I stack them up as visual representations of knowledge.  Whenever my eyes drift across the titles of their spines often makes my mind drift in interesting directions.  I need books lying around, it seems.  They’re like…trophies.
  2. They can’t be shared easily.
    I would guess we have almost 1,000 books floating around Blend.  We refer to them by title all the time.  We pass them around.  We drop them on each other’s desk.  They are a communal repository of knowledge, owned by the collective, that can be used by the individual.  I can’t get the same architecture from ebooks.

Interestingly, Atwood talks about a lot more shortcomings later in his article.  He concentrates a lot DRM, and the layout and presentation differences between ebooks and printed books.

But, other than those things, ebooks are so much more practical in every way.  I need to get over this hang-up, and transition away from physical books.


Apr 28

The Hyper-Addiction of Casual Gaming

Just One More Game …: This article is a neat look at casual games, and why they’re so addicting.  How did we get from Call of Duty to Angry Birds, and was Tetris just a gateway drug?  Why are these games so addictive?

The game was an anesthetic, an escape pod, a snorkel, a Xanax, a dental hygienist with whom to exchange soothingly meaningless banter before going under the pneumatic drill of Life. Soon I found myself struggling in the net of real addiction. Even as I pressed “New Game,” my brain would be thinking, very consciously, I have to stop playing this game. But I didn’t. Instead, I spread the Drop7 virus to other people: my wife, my friends, my mother, my in-laws. I found myself playing in all kinds of extreme situations: at 3 a.m., during a severe gastrointestinal crisis; immediately after an intense discussion with my mother; shortly after learning that my dog — the warm, emoting mammal I lived with for 12 years — was probably dying of cancer.


Apr 27

The Limits of Spam Algorithms

Yelp, You Cost Me $2000 by Suppressing Genuine Reviews, Here’s How You Fix It: An interesting story of a false positive spam algorithm doing some damage.

Yelp flagged poor reviews of a moving company as spam, and hid them.  Turns out, they were legit – the moving company were not nice people – but since this guy never saw the reviews, he hired the company anyway.

Turns out, the behavior meant to indicate spam overlaid perfectly on the behavior of people trying to complain about this company.

Your algorithm typically hides entries by people who only post one review and who don’t otherwise engage in Yelp. Your assumption is that if a user only posts one review, posts no comments, has no friends etc. then most likely they are fake and trying to game the system.

[…] In each case the one star review was left by someone who would never normally leave a review… they were simply so outraged that they were motivated to signup to Yelp and try to warn others how bad this company is. None of them ever used Yelp again. Furthermore, they didn’t have the knowledge or inclination to try to make their Yelp profile look acceptable to Yelp’s automated suppression systems.


Apr 22

2012 Intranet Innovation Awards

The 2012 Intranet Innovation Awards are now open for entries!: My friends over at Step Two have opened up the Intranet Innovation Awards for 2012.

The Intranet Innovation Awards are global awards that celebrate new ideas and innovative approaches to the enhancement and delivery of intranets. The goal is to find these remarkable solutions, and to share them with the wider community.

I had a long travel stretch late last year, and I actually bought all five of the previous award reports to read on planes and in airports.  It was great reading.  Intranets can be so inscrutable, because you can see anyone else’s, so if you’re wondering what cool things are people are doing, buy this report when it’s released.  Until then, consider entering something – you’ll be in good company.


Apr 20

Social Media is Taking Over Corporate Blogs

More companies quit blogging, go with Facebook instead: On a lesser scale, this is happening to entire websites as well.  Some companies are just redirecting their home pages to their Facebook pages.

With the emergence of social media, more companies are replacing blogs with nimbler tools requiring less time and resources, such as Facebook, Tumblr and Twitter.

A survey released earlier this year by the University of Massachusetts Dartmouth says the percentage of companies that maintain blogs fell to 37% in 2011 from 50% in 2010, based on its survey of 500 fast-growing companies listed by Inc. magazine. Only 23% of Fortune 500 companies maintained a blog in 2011, flat from a year ago after rising for several years.


Apr 8

The Hailstorm of Lawsuits in the Mobile Industry

Apple’s War on Android: My normally vitriolic stance towards Apple is softening a bit, but I still need to post this article that details the intellectual property war between Apple and…everyone else, it seems.  Specifically, Apple and Samsung are suing each other, mainly because Apple can’t sue Google directly over Android.

Here’s the crux of one of the suits – a description of what Apple feels is a “trademarkable” thing:

a rectangular product with four evenly rounded corners, a flat clear face covering the front of the product, [and] a large display screen under the clear surface.

I find it absurd that you can try to trademark something that general. Hell, the coffee table in my living room resembles this description perfectly.  The article itself has pictures of a couple of pre-iPhone products that match that description too.

But, the fault doesn’t just lie with Apple – everyone in the mobile space is suing everyone else, which seems to be the only reason Google purchased Motorola’s mobile unit:

Google announced it would pay $12.5 billion to acquire the company’s mobile-phone operation and its 17,000 patents. The deal, said Google CEO Page, will “enable us to better protect Android from anticompetitive threats from Microsoft, Apple, and other companies.” In other words: You sue us, we sue you.

The last paragraph of the article sums up the mess quite elegantly.

In the short run, the tech giants could save themselves considerable legal fees and distraction if they were to lock their lawyers in a hallway of conference rooms and refuse to release them until they had crafted a series of comprehensive cross-licensing pacts. This process eventually resolved similar litigation in the desktop computer field. Such a solution “is still probably what will happen here,” says Stanford’s Lemley. “But in the meantime, these companies have paid their lawyers more than $400 million” over the last several years. “It’s not clear what they’re getting for that money.”


Apr 7

Why List Articles Are So Popular

The List of N Things: Paul Graham nails the psychology of the list article and why it plays so well on the web – it guides us through the topic, and doesn’t force us to think too hard.

Structurally, the list of n things is a degenerate case of essay. An essay can go anywhere the writer wants. In a list of n things the writer agrees to constrain himself to a collection of points of roughly equal importance, and he tells the reader explicitly what they are.

Some of the work of reading an article is understanding its structure […] As well as being explicit, the structure is guaranteed to be of the simplest possible type: a few main points with few to no subordinate ones, and no particular connection between them.

I’ve talked about this before: The Psychology of the Bullet Point.

Bullet points signify a complete, contained, discrete thought.  They encapsulate some nugget of information, separate from everything else.  A bullet point tells us, “this piece of information is absorbable solely from the text in it,” and the text is usually short.


Apr 7

Is there a point to pagination anymore?

The End of Pagination: Jeff Atwood makes the case that pagination may just be an outdated concept.

I can understand paginating when you have 10, 50, 100, maybe even a few hundred items. But once you have thousands of items to paginate, who the heck is visiting page 964 of 3810? What’s the point of paginating so much information when there’s a hard practical limit on how many items a human being can view and process in any reasonable amount of time?

I’ve talked about this before: The Pointlessness of Category Archives.


Mar 21

Keanu Reeves on the Drawbacks of Digital Film

Steven Spielberg & Martin Scorsese: the joy of celluloid: The Guardian asked several film people what they thought of the switch from celluloid (“real”) film to digital film meant.

Keanu Reeves responded with a couple really thoughtful observations about how the physical limitations of film affected how he acted in front of the camera.  By extension, some of this is lost by digital.

The biggest difference I have found when working photochemically versus digitally on motion pictures is the length of time the takes can last. Broadly, a 1,000ft roll of 35mm film lasts around nine-and-a-half minutes before running out, while a digital tape or recording card or hard drive can last from 40 minutes to over an hour and a half. This translates to a very different rhythm on the floor; the pressure to “cut” to save film is alleviated.

And the temporal nature of digital – the fact that it can be wiped out and reshot with nothing lost – changes the vibe he gets.

When the director says: “Action”, and the film is rolling, it feels like something is at stake. It feels important and intense. In a way, death is present in the rolling of that film – we live, right now – and the director says: “Cut”. And that moment in time is captured on film, really.


Mar 21

The Next Evolution in Open CourseWare

The Higher Education Monopoly is Crumbling As We Speak: The Internet is threatening to destroy higher education.

First you had Open CourseWare, which was great, but incomplete.  The problem is that there was no test for mastery.  You could say you took a course, but there was proof of this, nor was there anything that certified that you learned something, so it was really nothing more than a “hobby option.”

Now, however, more cracks are appearing in the form of certifications from online courses given by some of the top universities in the country.  How long before these begin to substitute for credit?

The news was that the Stanford professors were letting students in their global classroom sit for the midterm, at proctored sites around the world. Those who did well on the A.I. test and a later final exam got a letter saying so, signed by the professors, a pair of well-known roboticists from Silicon Valley.

A few days later, MIT made a major announcement: The world-famous research university would be creating a new non-profit organization called MITx. It, too, would be offering free online courses, designed from the ground up to serve tens or even hundreds of thousands of students worldwide. And it, too, would administer exams to students who, if they passed, would receive a certificate saying so from MITx.




1