Incorporating Google Books into the Hit-list

So the folks over at Google Books think they can go ahead and incorporate our catalogs into their search, do they?

Actually, that's fine, I have no problem with that, which means... They should have no problem with me incorporating Google Books into our hit-list. Right?

Now when users search the AADL catalog, they will be given the option to peek inside the books on the hit-list--that is, if there is a record over at Google Books. Basically, the first time that record is displayed in the list, the middleware queries Google Books to see if it has that item in its database. If it does, the middleware makes note of that in a MySQL table so that the remote query doesn't need to be run again. That way, future queries save time and bandwidth.

Looking at the Syndetics offerings next to it, this seems like a much richer and more useful resource. Enjoy!

** Update 1: 8/24/06 9:45 PM **

Ha! It looks like that was short-lived! (Thanks to Ryan for giving me the heads-up), Google apparently doesn't return the favor:

We're sorry...

... but your query looks similar
to automated requests from a computer virus or spyware
application. To protect our users, we can't process your request
right now.

We'll restore your access as quickly as possible, so try again soon. In the meantime, if you suspect that your computer or network has been infected,
you might want to run a virus checker or spyware remover to make sure that your systems are free of viruses and other spurious software.

We apologize for the inconvenience, and hope we'll see you again on Google.

And here I was, trying to be nice by caching the results... Guess we'll have to wait for the API.

** Update 2: 8/25/06 8:50 AM **

So, I think I found a way to fix this. Essentially, the way I was previously determining if Google Books has a record for and ISBN what by using this URL template:

http://books.google.com/books?vid=ISBN$isbn&printsec=frontcover&dq=isbn:$isbn

Now I'm using a different URL that does not return 404:

http://books.google.com/books?as_isbn=$isbn

If there was no record for that ISBN, Google would throw a 404. I think the fact that one IP was requesting so many 404s is what spooked Google, not the retrieval rate. Also, I noticed that I could no longer use wget on the command-line to grab the data--Google would return a 403 (Forbidden). So, my thought was to ditch PHP's file_get_contents for CURL which allows you to spoof a user agent. I took a peek at our apache logs and chose:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.6) Gecko/20060728 Firefox/1.5.0.6

So, instead of looking like a "virus or spyware", the script now appears, to Google, as an extremely zealous Google Books user. We'll see how long it lasts, but it seems to be holding...

** Update 3: 8/25/06 11:40 AM **

No go, they've blocked us again. I'm sending an email to the kind folks at Google Books, and we'll see if they reply. Until then, I've got a few more tricks up my sleeve... In the meantime, I'll leave the cached information active...

** Update 4: 8/25/06 4:07:PM **

Google scores major points in my book! One of the managers over at Google Books just emailed me to say that he likes the idea of the hit-list links and that he is going to see if they can accommodate these types of queries.

[tags] Google, GoogleBooks, Sneaky, AADL, Library, OPAC, Catalog [/tags]

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • blinkbits
  • blogmarks
  • co.mments
  • del.icio.us
  • De.lirio.us
  • digg
  • Furl
  • LinkaGoGo
  • Ma.gnolia
  • scuttle
  • Shadows
  • Simpy
  • TailRank
  • YahooMyWeb

13 Comments so far
Leave a comment

John, it seems to add that link to all results. I get a 404 when clicking on it for most. An example is a search for “perl”. The hacks you show works correctly, but even the DVD on the list has a link.

Ryan,
You’re right, but read the update for an explanation…

Now fixed, Ryan, thanks!

Wow, nice work!

[...] I read over on blyberg.net that Google Book Search was including Find in Library links so I went to Google Book Search and did a search for my handy “PHP Cookbook” - no library links - then I did a search for “Anna Karenina” - no library links. What am I missing? [...]

[...] John Blyberg has an excellent post on how he incorporated Google Book Search results into his own library’s catalogue. Also: Techdirt has an article on re-thinking the music industry business model, by treating free content as promotional material, rather than lost content. [...]

Sweet! Please keep us posted on what you learn from Google in terms of supporting this type of linking in the future.

[...] Incorporating Google Books into the Hit-list Hit-list links to preview book text within the ILS. (tags: opac catalog) [...]

[...] blyberg.net ยป Incorporating Google Books into the Hit-list (tags: OPAC) Tags [...]

Neat-o! I wish Google would understand that this is a kind of win-win crosslinking, and officially let us do it. Perhaps, adding GBooks to the API?

Alejandro,
They do, in fact. I think they are committed to working with libraries (as long as it benefits them in some way, of course). In a case like this, it’s a matter of finding the personnel time to make it happen.

[...] Last week, Blyberg had a nice post about incorporating links to “look inside this book” at Google Books into the AADL OPAC. It’s the sort of thing that immediately gets me really excited since enriching the OPAC is one of my main interests at work. Of course, I was all set to figure out how to do the same for our OPAC.But when I showed the feature to my boss, she was distinctly underwhelmed. She pointed out a serious flaw with the whole approach which I had missed, I think because I was not seeing the forest for the trees (or something like that). And that problem which I initially missed but now agree makes it not a great idea for us is one of the extent of Google Books’ coverage as it relates to what is in our OPAC. The hit rates to “look inside this book” at Google Books for our collections are simply too low. Based on a very unscientific survey, I would say they are in the single digits percentage-wise. Amazon’s “search inside the book” seemed to have slightly better coverage but still very low. Presumably this situation will improve over time as copyright issues relating to these services get resolved but I can’t see offering these types of links until coverage approaches 50%. It just doesn’t make sense to me to offer a feature like this for a small minority of our collections. Still, it’s definitely an idea with potential. [...]

“One of the managers over at Google Books just emailed me to say that he likes the idea of the hit-list links and that he is going to see if they can accommodate these types of queries.”

So that was a year ago. What happened? I too would like to incorporate this kind of functionality in a library software product I am working on.



Leave a comment
Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

(required)

(required)