Google
 

FRBRizing Amazon’s Catalog

March 24, 2008 – 2:06 pm

While doing research for a book data project, I stumbled on an interesting discovery. One movement that has being gaining steam recently among librarians and others involved with book information is FRBR which among other things seeks to link various editions of the same book together. For example, all of Harry Potter books are currently entered into library and book store catalogs separately and there is no easy way to find a translation of a specific volume. FRBR seeks to fix that by re-thinking catalogs in a hierarchy where the Work (in this case Harry Potter) will be linked to each of its editions and translations (Expressions and Manifestation).

Unfortunately, it is not an easy task to figure out what links to what. There are two current public approaches - OCLC which operates the WorldCat service used computers to automatically try to match up different editions. Their service is called xISBN and is limited to 500 queries a day and non-commercial use. OpenLibrary is doing something similar with an algorithm. The other approach is by using people and this is what LibraryThing is doing with their thingISBN service. People who use LT to catalog their books have an option to specify if specific editions are in fact the same work. That cumulative data is published via their API.

About two weeks ago I accidentally stumbled on a third public service that does something similar.  When Amazon launched their Kindle eBook reader they made lots of titles available as a Kindle eBook. HOWEVER, they did not want to change the ISBN numbers for these titles. So what they did is re-organize their catalog is a way that all editions of the same work now appear to be linked to together including audio, eBook, hard cover, etc. This ability is buried in their API right here and is called RelatedItems:

 The RelatedItems response group returns items related to an item specified in an ItemLookup request. .Related items could be, for example, all of the Unbox episodes in a TV season that are sold separately, or, for example, all of the MP3Download tracks on a MP3 album.

For books this is described visually as follows:

Each item now has a parent “authority title” which lists all of the children which are editions of that work. The authority title corresponds to Work in FRBR and the children are Expressions and Manifestations.

Practically speaking what this means is that book related websites now have another way to figure out how different editions are related to each other. Obviously this service is skewed towards stuff that is actually being sold but it does show some surprising results. For example, this search for one of the Pendragon books shows an extra few editions in Amazon that neither LibraryThing or WorldCat list. On the other hand, the canonical Dune example shows better data in WorldCat and LT than Amazon.

I am currently working on a possible service that will use Amazon’s data as an supplement to xISBN and thingISBN. Because of legal issues, this will not be a web service but rather code that can be used with people’s own AWS accounts. The service is tentatively called amazingISBN. I am also releasing an experimental web tool that retrieves data from all three services side by side right here. Due to use restrictions and underlying legal issues, this service is highly experimental.

Simple Solution for Amazon’s Web Services Reliability

February 16, 2008 – 11:25 pm

This past Friday a major 2 1/2 hour outage of Amazon web services hit the Internet. The blogs were abuzz with the gory details but the resounding scheme has been that it is not reliable enough yet. Being an Amazon AWS user as well as many others who visit their forums, it is a well known fact that Amazon’s web services experience issues on a frequent basis (although not as bad as Friday). This is why for example, my own projects that use Amazon’s web services do so on a asynchronous basis.

In my personal opinion, there is a very simple solution to make their services very reliable and to make people trust them. Just host all of Amazon.com’s images on the web services. Being that Amazon as whole makes billions from their sites versus a paltry 130 million from the web services, that would force their web services team to provide high reliability as well as restore trust in the service.

Just my own two cents.

Publishing web pages on Amazon’s S3 service

March 23, 2006 – 11:18 am

Amazon recently released their new S3 web service designed to cheaply store any kind of data. Some enterprising users have started using this for web hosting! Here is an example (and related blog post).

[NOTE: That I do not necessary agree with the “content” of the example]

Amazon Associates Stats via RSS

November 30, 2005 – 12:29 am

Since adding Amazon ads to my blog, I have been looking around the Net for a way to track the Amazon referal statistics via RSS. The closest thing I found was this piece of code which retreives the XML version of Amazon reports. I wrote my own version of it and combined it with an XSLT stylesheet for transforming data into RSS - and voila, amazon2rss was born. You can download version 0.1 right here.

BookChaser: an Ajax Book Search Engine

May 26, 2005 – 1:19 am

A very long time ago (about five years) I had the bright idea of starting a new search engine like IMDB but for books. Eventually I purchased the BookChaser.com domain name and have held on to it ever since. At some point a bunch of people like me got together and formed the Internet Book Database Project. Unfortunatly, interest and other time constraints eventually disbanded out little group and nothing ever came of it.

Looking at the same idea after five years I suddenly see new hope. Two of the main problems all along have been is (1) getting all the book data and (2) getting enough money to run servers to store that data. Now looking at what’s out there including AJAX and Webservices for sites like Amazon, I suddenly see those problems solved. So here is a short summary of what I think BookChaser should look like (but unfortunatly I haven’t got the time to code it):

1. Full-AJAX application in HTML requireing no other servers.
2. Primary search is done against Amazon’s database via AWS and XmlHttpRequest with results displayed directly in the browser (I have done something similar for UPS’s XML API with AJAX).
3. Search against other book stores to retreive prices just like Book Burro does in FireFox.
4. Gets information from libraries based on LibraryLookup from John Udell.
5. Maybe even add Z39.50 support via some toolkit that calls Z39.50 gateways directly.
6. Add support for reviews via AllConsuming, IBList, and others.

The only downsides that I see with this approach is that all of this would run very slow in a browser and put a large load on all of the services involved. A better approach might be to provide some generic web services caching for these server side in some instances (but that would kill all of the fun).

UPDATE: I took some time and coded up a prototype. BookChaser v0.1 is available here and here. Leaves comments on this post OR send email to code \at\ shaftek {dot} org.