You are currently browsing the tag archive for the ‘util’ tag.

The topterm util returns the top ranking item from google, from the command line.

Heres a sample session :

unix_host$ python
Enter Search Term [or blank to quit]: justgord
TITLE : gordon anderson (justgord) on Twitter
URL   :
Enter Search Term [or blank to quit]: terry tao
TITLE : What's new
URL   :
Enter Search Term [or blank to quit]: quantblog vfuncs
TITLE : vfuncs – functional coding with array 'verbs' in C « quantblog
URL   :
Enter Search Term [or blank to quit]:

Verbose Description

Recently I was doing some testing of a machine learning algorithm, which looks for relevance patterns in a graph of text nodes.  I was doing a small amount of web crawling for this, and as a quick test of my output needed a tool that could get the top ranking web site given a search term – a command line google-it was needed!

I decided to hack this up in Python. One of the things that put me off the language initially is that whitespace is used to indicate nested program logic.  I guess I’m just an old-skool bracket lover at heart.   Putting that bias aside, Python is not a bad language at all, and I found myself writing surprisingly small programs in it, even if they don’t have the thought-as-code feel of Scheme or Clojure.

For this command line app, I had hoped to reuse a google API : ideally I’d request an url with curl or wget and get back some nice JSON output which could be trivially traversed.   Unfortunately, I could find no such api, so had to resort to screen-scraping the output to get the first entry.  The curl utility does the heavy lifting, with BeautifulSoup to traverse the HTML.

So now I have a handy little program that asks for search terms and prints the URL and Title of the first term according to google.  It would be nice if something like this was an option to lynx.

Ive posted the python project source [less than a page] on google code – here