< Previous
Restaurant Tradeoff >

[Comments] (2) Random Books: Jarno has a little application that picks random search terms to find a random book on Amazon. But dang if I didn't click that doohickey ten times without it ever coming up with a book. I decided to try and implement a random book finder that would reliably come up with books.

The main problem is the arbitrary restriction of Amazon Web Services where you can't get more than 250 pages of query results. If it weren't for this restriction, you could pick a random number from a huge range and do a Browse Node search to get the corresponding page of results. This is still a pretty good solution if you want to search for products in a restrictive set like the set of science fiction books: find the corresponding Browse Node and do a browseBestSellers.

As it is, the simplest solution is to only search for a single word, instead of Jarno's two. Here's some code that does that and works pretty well.

import amazon
import random

def random_product(keyword, max_page=250, associate="crummthesite-20"):
    result = None
    while result == None:
        page = random.randint(1, max_page)
        try:
            results = amazon.searchByKeyword(keyword, page=page, type="lite",
                                             associate=associate)
            if results:
                result = random.choice(results)
        except amazon.AmazonError:
            results = None
        if not results:
            max_page /= 2
    return result

def format_product(p):
    print '<a href="%s">%s</a>' % (p.URL, p.ProductName)

keyword = random.choice(open("american.0").readlines()).strip()
format_product(random_product(keyword))

Sample book: Reutilization of Waste Materials.

The other solution is screen-scraping. As covered earlier, there is an Amazon URL such that you can randomly vary a parameter and get a more or less random set of books. Here's some code that fetches one of those URLs and grabs an ASIN at random. Then it does an AWS lookup of that ASIN because life is too short to scrape all the appropriate information off that big ol page. This is a little less reliable than the earlier implementation, but it gets a better random distribution of books (since the other one picks a word to search on, and word distribution is not random).

from BeautifulSoup import BeautifulSoup
import amazon
import random
import urllib2
import re
import socket

URL = "http://www.amazon.com/exec/obidos/tg/new-for-you/top-sellers/-/books/all/books/0/1/%s/1/"
def random_book(max_page=150000, associate="crummthesite-20"):

    product = None

    while product == None:
        url = URL % random.randint(1, max_page)

        req = urllib2.Request(url)
        try:
            soup = BeautifulSoup(urllib2.urlopen(req).read())
            asins = soup.fetch("input", {"type" : "hidden",
                                         "name" : re.compile("asin\.*")})
            if asins:
                asin = random.choice(asins)['name'][5:]
                product = amazon.searchByASIN(asin, "lite", associate=associate)[0]
        except socket.error:
            product = None
        if not product:
            max_page /= 2
    return product

def format_product(p):
    print '<a href="%s">%s</a>' % (p.URL, p.ProductName)

print format_product(random_book())

Sample book: Choice, Welfare and Measurement

Filed under:

Comments:

Posted by Jarno Virtanen at Thu Mar 02 2006 00:53

Mine did actually search for one word only, but the host machine cannot connect to Amazon right now for some reason. The "error message" could have been clearer. What it is supposed to mean is that neither of those terms returned any results. (In this case because of problems connecting to Amazon.) The idea was to try for a another word if the first didn't give any results.

I blame the Web 2.0 meme for me not bothering to finish the app decently. ;-)

Posted by Leonard at Thu Mar 02 2006 08:34

Ah, that makes sense. I thought you were going for a Googlewhack.


[Main]

Unless otherwise noted, all content licensed by Leonard Richardson
under a Creative Commons License.