By Deane Barker on July 20, 2003

We unleashed our new search page over the weekend. To make it work correctly, we had to retroactively keyword all past entries (a endeavor about which I will write more later). but with that done, it seems to be working quite well.

The engine returns terms in two groups — (1) those in which the search term appears in the title or keywords (Best Bets), and (2) those in which the search term appears anywhere else (The Rest). The second group is de-duped, so items from Best Bets don’t show up in The Rest. Searches like RSS, ColdFusion, and NewsGator demonstrate that it works pretty well.

(In the process of “keywording” everything, we had to make some distinctions between what’s a category and what’s a keyword. I touched on the process in this entry, but in practice, it wasn’t so clear. We’re going to stick with the categories for grouping under a particular topic, and we limited keywords to proper nouns — protocol names, companies, software, etc.)

Finally, we wanted to use the full-text indexing capabilities of MySQL, but we’re on a hosted server and the minimum word length for this server is set to four characters. Thus, searches for three-letter acronyms like RSS, IBM, XML, SCO, AIX, etc. would return nothing. Consequently, the page uses a set of LIKE queries. We were concerned a bit about speed, but it’s vastly faster than the default Movable Type search, so that fear seems misplaced.

  1. I found a problem with the new search page. Search for “SCO”. You’ll find that you get hits on the words “diSCOunt,” “SCOtt”, and “OSCOM.” I need to work up a regex to make sure you’re matching just the word, not the sequence of characters. THis is where full-text indexing would come in handy.

