So the folks over at Google Books think they can go ahead and incorporate our catalogs into their search, do they?
Actually, that's fine, I have no problem with that, which means... They should have no problem with me incorporating Google Books into our hit-list. Right?
Now when users search the AADL catalog, they will be given the option to peek inside the books on the hit-list--that is, if there is a record over at Google Books. Basically, the first time that record is displayed in the list, the middleware queries Google Books to see if it has that item in its database. If it does, the middleware makes note of that in a MySQL table so that the remote query doesn't need to be run again. That way, future queries save time and bandwidth.
Looking at the Syndetics offerings next to it, this seems like a much richer and more useful resource. Enjoy!
** Update 1: 8/24/06 9:45 PM **
Ha! It looks like that was short-lived! (Thanks to Ryan for giving me the heads-up), Google apparently doesn't return the favor:
We're sorry...
... but your query looks similar
to automated requests from a computer virus or spyware
application. To protect our users, we can't process your request
right now.We'll restore your access as quickly as possible, so try again soon. In the meantime, if you suspect that your computer or network has been infected,
you might want to run a virus checker or spyware remover to make sure that your systems are free of viruses and other spurious software.We apologize for the inconvenience, and hope we'll see you again on Google.
And here I was, trying to be nice by caching the results... Guess we'll have to wait for the API.
** Update 2: 8/25/06 8:50 AM **
So, I think I found a way to fix this. Essentially, the way I was previously determining if Google Books has a record for and ISBN what by using this URL template:
http://books.google.com/books?vid=ISBN$isbn&printsec=frontcover&dq=isbn:$isbn
Now I'm using a different URL that does not return 404:
http://books.google.com/books?as_isbn=$isbn
If there was no record for that ISBN, Google would throw a 404. I think the fact that one IP was requesting so many 404s is what spooked Google, not the retrieval rate. Also, I noticed that I could no longer use wget on the command-line to grab the data--Google would return a 403 (Forbidden). So, my thought was to ditch PHP's file_get_contents for CURL which allows you to spoof a user agent. I took a peek at our apache logs and chose:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.6) Gecko/20060728 Firefox/1.5.0.6
So, instead of looking like a "virus or spyware", the script now appears, to Google, as an extremely zealous Google Books user. We'll see how long it lasts, but it seems to be holding...
** Update 3: 8/25/06 11:40 AM **
No go, they've blocked us again. I'm sending an email to the kind folks at Google Books, and we'll see if they reply. Until then, I've got a few more tricks up my sleeve... In the meantime, I'll leave the cached information active...
** Update 4: 8/25/06 4:07:PM **
Google scores major points in my book! One of the managers over at Google Books just emailed me to say that he likes the idea of the hit-list links and that he is going to see if they can accommodate these types of queries.
[tags] Google, GoogleBooks, Sneaky, AADL, Library, OPAC, Catalog [/tags]














13 Comments so far
Leave a comment
John, it seems to add that link to all results. I get a 404 when clicking on it for most. An example is a search for “perl”. The hacks you show works correctly, but even the DVD on the list has a link.
By Eby on 08.24.06 7:59 pm | Permalink
Ryan,
You’re right, but read the update for an explanation…
By john on 08.24.06 8:58 pm | Permalink
Now fixed, Ryan, thanks!
By john on 08.25.06 8:38 am | Permalink
Wow, nice work!
By Edward Vielmetti on 08.25.06 8:40 am | Permalink
[…] I read over on blyberg.net that Google Book Search was including Find in Library links so I went to Google Book Search and did a search for my handy “PHP Cookbook” - no library links - then I did a search for “Anna Karenina” - no library links. What am I missing? […]
By What I Learned Today… » Blog Archive » Google Book Search & Libraries on 08.25.06 12:16 pm | Permalink
[…] John Blyberg has an excellent post on how he incorporated Google Book Search results into his own library’s catalogue. Also: Techdirt has an article on re-thinking the music industry business model, by treating free content as promotional material, rather than lost content. […]
By VALIS » Blog Archive » Google Book Search; the music industry business model on 08.27.06 2:32 am | Permalink
Sweet! Please keep us posted on what you learn from Google in terms of supporting this type of linking in the future.
By Mike on 08.28.06 1:22 pm | Permalink
[…] Incorporating Google Books into the Hit-list Hit-list links to preview book text within the ILS. (tags: opac catalog) […]
By blogdriverswaltz.com » Blog Archive » links for 2006-08-28 on 08.28.06 6:19 pm | Permalink
[…] blyberg.net ยป Incorporating Google Books into the Hit-list (tags: OPAC) Tags […]
By Kenton Good » links for 2006-08-29 on 08.29.06 6:20 pm | Permalink
Neat-o! I wish Google would understand that this is a kind of win-win crosslinking, and officially let us do it. Perhaps, adding GBooks to the API?
By Alejandro on 09.04.06 1:52 pm | Permalink
Alejandro,
They do, in fact. I think they are committed to working with libraries (as long as it benefits them in some way, of course). In a case like this, it’s a matter of finding the personnel time to make it happen.
By john on 09.04.06 2:44 pm | Permalink
[…] Last week, Blyberg had a nice post about incorporating links to “look inside this book” at Google Books into the AADL OPAC. It’s the sort of thing that immediately gets me really excited since enriching the OPAC is one of my main interests at work. Of course, I was all set to figure out how to do the same for our OPAC.But when I showed the feature to my boss, she was distinctly underwhelmed. She pointed out a serious flaw with the whole approach which I had missed, I think because I was not seeing the forest for the trees (or something like that). And that problem which I initially missed but now agree makes it not a great idea for us is one of the extent of Google Books’ coverage as it relates to what is in our OPAC. The hit rates to “look inside this book” at Google Books for our collections are simply too low. Based on a very unscientific survey, I would say they are in the single digits percentage-wise. Amazon’s “search inside the book” seemed to have slightly better coverage but still very low. Presumably this situation will improve over time as copyright issues relating to these services get resolved but I can’t see offering these types of links until coverage approaches 50%. It just doesn’t make sense to me to offer a feature like this for a small minority of our collections. Still, it’s definitely an idea with potential. […]
By ex libris » libraries, music, tech » Blog Archive » google books in the opac on 09.22.06 1:13 pm | Permalink
“One of the managers over at Google Books just emailed me to say that he likes the idea of the hit-list links and that he is going to see if they can accommodate these types of queries.”
So that was a year ago. What happened? I too would like to incorporate this kind of functionality in a library software product I am working on.
By Jonathan Rochkind on 09.11.07 10:16 am | Permalink
Leave a comment
Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>