Gadgetopia: Search Engines

 This channel has it's own RSS feed at this link.

Gadgetopia Channel

Search Engines

Aug 2

Page Views, Visits, and Visitors: Some Google Analytics Definitions

I wrote what’s below in an email to a client to help them understand their Google Analytics reports.  I found it the other and thought maybe you folks could get some use out of it too.  It’s some definitions of the core pillars of Google’s analytics architecture.

Here goes (and if anyone finds any inaccuracies here, please point them out):

A page view is the simple metric to understand.  It’s every time a single page is rendering a browser.

A unique page view is the same as a page view, but multiple visits to the same page during the same visit (see below) are only counted once.  So if Bob visits the home page, then the mortgage page, then the home page again, he has racked up three page views, but only two unique page views.

(Why care about the difference?  If you sell advertising on your site, you care about raw page views.  If Bob visits the same page 10 times during a visit, you don’t care, because you get an ad impression every time.  But if you’re actually trying to measure the effectiveness of your site, then multiple views of the same page are not worth noting, so you’d care about unique page views.)

A visit is a single user session.  So Bob, in the above example, has generated a single visit.  If he comes back tomorrow, that’s a new visit.  (Visits time out after 30 minutes, so if Bob came back 45 minutes later, it would also be a new visit.)

Visitors (also known as “absolute unique visitors”) get a little more complicated.  A visitor is the number of unique people that visited during the reporting time period you’re looking at.  Remember that every report in GA is limited by a date range.  If you’re looking at one month, Bob will only be counted as one visitor during that period, even though he may have visited every day (incurring a visit each time).

A new visitor is a visitor who has not be recorded visiting prior to the time period being viewed.  A returning visitor is a visitor who has visited prior to the reporting period being viewed.

Here’s an example that brings them all together —

Bob visits your site on December 12, 2008, viewing 10 total pages.  However, he kept returning to the home page, which he viewed four times.

In this case, Bob has generated one visit, 10 page views, and 7 unique page views (six pages and the home page, counted only once).

Bob returns on January 5, 12, and 20, 2009.

If you look at the reports for January only, you would see.

Bob would be a single visitor during that time.  He would be classified as a returning visitor, because of his visit in December.  (If you looked at the report for December, however, he would be classified as a new visitor in that month).

Bob would also have registered three visits during January, plus any page views and unique page views that he generated during his visits.


Jun 8

The Bing API

Microsoft Releases Bing API - With No Usage Quotas: This is fairly cool.

In a world where APIs are often limited in many ways, it’s notable that in addition to these technical updates that Microsoft has removed the API usage quotas found in the Live Search API, with just the requirement that it be used for “user-facing applications” only. Note that the terms of use have also been loosed to allow more flexible presentation options such as no restrictions on ordering and blending search results.

I’ve been using Bing for a week now and I really like it.  In particular, the mouse-over result summaries and image searching are actually better than Google.


May 14

Wolfram Alpha

Wolfram Alpha: This is getting a lot of buzz. Early previews (screencast — I haven’t watched it) are apparently quite good.

Wolfram Alpha (also spelled WolframAlpha or Wolfram|Alpha) is an answer-engine developed by the international company Wolfram Research. It is an online service that answers factual queries directly by computing the answer from structured data, instead of providing a list of documents or web pages that might contain the answer.

[…] Wolfram Alpha is not a search engine, as it does not look up answers to queries on an index of web pages or documents. Queries and computations are similarly posed to it via a text field, but it computes answers and relevant visualizations on the fly from a knowledge base of curated, structured data.


Apr 13

Google Disapproval


The image above is what happens when Google gets mad at you. This graph represents referrals to Gadgetopia from Google.

The big valley in the middle there is the result of some sloppiness on my part. I was testing something on Gadgetopia, and accidentally left it in place. It was a Thickbox-powered IFRAME that loaded a different Web site. So, I had a hidden IFRAME to a different domain, which apparently Google does not like.

I was aware that Google traffic had dropped off considerably, but I couldn’t figure out why and didn’t really have time to look into it. Then one day, I was running a search engine spider on Gadgetopia (another test for something else — I test all sorts of stuff on Gadgetopia), and I noticed it kept coming back with weird keywords and descriptions. That’s when I remembered the IFRAME….

I searched back through old emails about the project that used the IFRAME, compared the dates with Google Analytics, and realized that Google search traffic started to drop off about two days after I installed the Thickbox code.

I removed it, and, sure enough, Google referrals started to climb again within 48 hours. They’re now back where they were pre-stupidity.

For the record, I don’t think it had anything to do with Thickbox directly. I think it was a combination of a (1) hidden, (2) IFRAME, to (3) a different domain, that did it. I don’t know the exact black-hat SEO mechanism at work here, but I’m sure people have abused something similar to that scenario.


Feb 21

More Information About How Google Works

Jeff Dean keynote at WSDM 2009: A Google engineer gave a talk at a conference where he revealed some crazy stats about Google’s architecture:

Google now detects many web page changes nearly immediately, computes an approximation of the static rank of that page, and rolls out an index update. For many pages, search results now change within minutes of the page changing.

[…] Their performance gains are also impressive, now serving pages in under 200ms. Jeff credited the vast majority of that to their switch to holding indexes completely in memory a few years back. […] that now means that a thousand machines need to handle each query rather than just a couple dozen […]

So my query hits a thousand machines? Maybe the “Google kills trees” argument from a couple months ago wasn’t so far off base?

Google’s tweaking went all the way down to where the data was physically located on disk:

[…] Jeff said they paid attention to where their data was laid out on disk, keeping the data they needed to stream over quickly always on the faster outer edge of the disk, leaving the inside for cold data or short reads.


Jan 28

Google to Show Favicons in Search Results?

Google Tests Site Favicons in Select Search Results: Interesting.

Google has confirmed reports that it is testing the use of favicons in search results for select users.

[…] At present, favicons in search results will only appear when a user conducts a “site: command search.” The feature is not available to all users.


Jan 11

Searching Google, Killing Trees

Revealed: the environmental impact of Google searches: here is your guilt trip of the day.

Performing two Google searches from a desktop computer can generate about the same amount of carbon dioxide as boiling a kettle for a cup of tea, according to new research.

While millions of people tap into Google without considering the environment, a typical search generates about 7g of CO2 Boiling a kettle generates about 15g.

Here’s supposedly the key to the increased energy usage:

[…] your request doesn’t go to just one server. It goes to several competing against each other. […] It may even be sent to servers thousands of miles apart. Google’s infrastructure sends you data from whichever produces the answer fastest.

Update: Google is disputing this number.

Together with other work performed before your search even starts (such as building the search index) this amounts to 0.0003 kWh of energy per search, or 1 kJ. For comparison, the average adult needs about 8000 kJ a day of energy from food, so a Google search uses just about the same amount of energy that your body burns in ten seconds.

In terms of greenhouse gases, one Google search is equivalent to about 0.2 grams of CO2. The current EU standard for tailpipe emissions calls for 140 grams of CO2 per kilometer driven, but most cars don’t reach that level yet. Thus, the average car driven for one kilometer (0.6 miles for those of in the U.S.) produces as many greenhouse gases as a thousand Google searches.


Dec 24

PDF Database

PDF Database - pdf and doc search engine: This is a search engine that only indexes and searches PDFs and Word files.

I find this interesting for a particular reason: what does format say about content? Would we find a different mix of content in PDF and Word files than we could find in HTML?

I did some random searches:

The results were interesting. I’m trying to put my finger on if or why the content would be a different…flavor. I found a fair amount of sales presentations. Could it be that PDF and Word content is more directed to a specific audience, rather than for random, public dissemination?

Interested on your thoughts as to if or why the content character and quality would be any different than HTML search results.


Nov 22

The Netflix Search Contest Continues...

If You Liked This, Sure to Love That - Winning the Netflix Prize: Here’s an engrossing article about the race to improve the Netflix search algorithm, which we first talked about two years ago. The goal was to improve the current search by 10%, and no one has claimed the prize yet, though some are very, very close.

Bertoni says it’s partly because of “Napoleon Dynamite,” an indie comedy from 2004 that achieved cult status and went on to become extremely popular on Netflix. It is, Bertoni and others have discovered, maddeningly hard to determine how much people will like it. When Bertoni runs his algorithms on regular hits like “Lethal Weapon” or “Miss Congeniality” and tries to predict how any given Netflix user will rate them, he’s usually within eight-tenths of a star. But with films like “Napoleon Dynamite,” he’s off by an average of 1.2 stars.


Nov 22

Google Kills Lively

Google Unplugs Lively as Hype Fades Over Virtual Worlds: Wow, this was fast. By comparison, Google Answers went on for four-and-a-half years.

Google portrayed the move away from user frivolity as a sound business move. “It has been a tough decision, but we want to ensure that we prioritize our resources and focus more on our core search, ads and apps business,” the company said on a blog post.

In the past Google has shown a willingness to let projects linger with a “beta” tag for months and months. But the virtual experiment was apparently not worth even that effort.


Nov 12

Google Flu Trends

Google Uses Web Searches to Track Flu’s Spread: This is ultra-awesome. This is John Battelle’s Database of Intentions put to good use.

Turns out a lot of ailing Americans enter phrases like “flu symptoms” into Google and other search engines before they call their doctors.

That simple act, multiplied across millions of keyboards in homes around the country, has given rise to a new early warning system for fast-spreading flu outbreaks, called Google Flu Trends.

Tests of the new Web tool from Google.org, the company’s philanthropic unit, suggest that it may be able to detect regional outbreaks of the flu a week to 10 days before they are reported by the Centers for Disease Control and Prevention.

Technically, this is the area of science known as epidemiology, which is the study of, among other things, the outbreak of disease. Blend once had a really good idea for a product in this space, but outbreak tracking and reporting is a fairly saturated market dominated by some really big players.


Oct 7

Mail Goggles

Stop sending mail you later regret: This has got to be a joke. I can’t find it in my Gmail this morning, at any rate.

When you enable Mail Goggles, it will check that you’re really sure you want to send that late night Friday email. And what better way to check than by making you solve a few simple math problems after you click send to verify you’re in the right state of mind?


Sep 11

List of Googlebot IPs

google.txt: I don’t know how well this is kept up-to-date, but it purports to be a list of all the IPs the Googlebot comes from.

This is of interest after the last post, because Google says you can detect the Googlebot User Agent and bypass login pages to index content behind subscription walls. But it also means that anyone can bypass your login page as well by changing their User Agent.

But, if this is an accurate list of Googlebot IPs, then you could detect both the User Agent and the IP. The trick, of course, is to make sure you have a an up-to-date list of IPs. I imagine it changes a lot, but of the 281 lines in this file, several are Class C subnets, which would encompass ~250 other IPs, so the actual number of potential IPs in this list is into the thousands.


Sep 11

The Googlebot and Subscription Sites

: Registration/subscription sites: You are apparently allowed to detect the Googlebot and allow it to bypass your login page and crawl content behind it. This means that subscription content will appear in the Google index, but people will be prompted to login when they click a search result.

Google News does include sites that require users to set up usernames and passwords to access content. Since crawlers can’t currently fill out registration or subscription forms, nor do they support cookies, we need to be able to circumvent those pages in order to successfully crawl those sites. The easiest way to do this is to configure your servers to not serve the registration or subscription page to our crawlers (when the User-Agent is “Googlebot”).

It also means, of course, that you could change your User Agent string and bypass the login page for any site that does this.

Via a good discussion on Reddit about Experts Exchange and how they do some funky things with Google.

Just for giggles, I downloaded the User Agent Switcher Firefox extension and confirmed that when you change your User Agent to impersonate the Googlebot, all the answers appear uncloaked on Experts Exchange pages.


Sep 6

Why does Google Knol's search suck so much?

Update: See the comments. Google stopped by to say their search was temporarily broken and it’s fixed now.

Has anyone else realized that the search feature on Google Knol really, really sucks? I don’t get it. Did Google forget everything it knew about search when they created this thing?

I searched for “content management system” (without quotes). Here were the first five results:

Um…what? At the same time, these titles were NOT returned:

I put “content management system” in quotes, and the results got a little better, but not by much. The first three results are still unrelated, but the “Nine reasons…” knol becomes result number four and Joomla! shows up at number five.

I don’t get it. I think Google knows a couple things about search, but you wouldn’t know it from this mess.



Want to advertise on this site? Contact FM.
Laser Toner Cartridges UK laser toner, toner cartridges, hp toner, lexmark toner, samsung toner, canon, toner, epson toner, oki toner, kyocera toner, xerox toner, remanufactured toner, compatible toner
Direct TV Deals Free 4 room direct tv deals. no equipment to buy. free fast professional direct tv installation. this is the best direct tv deal available anywhere.
SEO Article Learn from the experts with our SEO article.
rope light Shopping with birddog distributing, inc., gives you access to the lowest prices, the best customer service and the quickest delivery times possible.
Laptop AC Adapter We offer genuine factory direct replacement AC adapters.
Direct TV Best satellite TV deals.
Direct TV Deals Direct TV programming deals are varied and include packages containing from 50 channels up to over 250 channels.
8mm film to DVD Retain family memories with the only frame by frame digital restoration service in the United States for your 8mm film to DVD today
Rubber Stamp Shop for custom self-inking stamps, hand stamps, address stamps, label stamps, check endorsement stamps, check deposit stamps, date stamps, pre inks, pocket stamps, ink and much more!

1