Interface Updates for PublicDomainReprints.org
April 29, 2008 – 9:53 pmI did some minor interface updates:
1. A new section was added to the site listing titles that are pending in queue to be processed.
2. Pages for titles already processed were enhanced with cover images and metadata from the original archives, as well as enhanced listing of reprints and links to external services such as Amazon and LibraryThing. Here is an example.
Digg This Share This PostOn Vacation
April 16, 2008 – 9:52 pmI am going away on vacation until April 30th. If you contact me between now and that date, rest assured that your message was received but you will not get a reply until after that date. For emergencies, please call me at 410-696-4611.
As a pre-caution, PublicDomainReprints.org processing of requests has been disabled since there will not be anyone available to take care of things if something goes wrong. Pending requests will be processed after April 30th, 2008.
Digg This Share This PostNew AJAX Search for PublicDomainReprints.org
March 30, 2008 – 11:05 pmOver the weekend I coded up a new BETA search function for PublicDomainReprints.org which uses Google’s AJAX Search API. This new search (which can be found here), lets people search for more public domain books and request reprints without leaving the site. The original search is still available on the same page (it works by redirecting people to the original archive).
In the works: hard cover support, archival quality reprints, splitting of large volumes, and more!
Digg This Share This PostFRBRizing Amazon’s Catalog
March 24, 2008 – 2:06 pmWhile doing research for a book data project, I stumbled on an interesting discovery. One movement that has being gaining steam recently among librarians and others involved with book information is FRBR which among other things seeks to link various editions of the same book together. For example, all of Harry Potter books are currently entered into library and book store catalogs separately and there is no easy way to find a translation of a specific volume. FRBR seeks to fix that by re-thinking catalogs in a hierarchy where the Work (in this case Harry Potter) will be linked to each of its editions and translations (Expressions and Manifestation).
Unfortunately, it is not an easy task to figure out what links to what. There are two current public approaches - OCLC which operates the WorldCat service used computers to automatically try to match up different editions. Their service is called xISBN and is limited to 500 queries a day and non-commercial use. OpenLibrary is doing something similar with an algorithm. The other approach is by using people and this is what LibraryThing is doing with their thingISBN service. People who use LT to catalog their books have an option to specify if specific editions are in fact the same work. That cumulative data is published via their API.
About two weeks ago I accidentally stumbled on a third public service that does something similar. When Amazon launched their Kindle eBook reader they made lots of titles available as a Kindle eBook. HOWEVER, they did not want to change the ISBN numbers for these titles. So what they did is re-organize their catalog is a way that all editions of the same work now appear to be linked to together including audio, eBook, hard cover, etc. This ability is buried in their API right here and is called RelatedItems:
The
RelatedItemsresponse group returns items related to an item specified in an ItemLookup request. .Related items could be, for example, all of the Unbox episodes in a TV season that are sold separately, or, for example, all of the MP3Download tracks on a MP3 album.
For books this is described visually as follows:

Each item now has a parent “authority title” which lists all of the children which are editions of that work. The authority title corresponds to Work in FRBR and the children are Expressions and Manifestations.
Practically speaking what this means is that book related websites now have another way to figure out how different editions are related to each other. Obviously this service is skewed towards stuff that is actually being sold but it does show some surprising results. For example, this search for one of the Pendragon books shows an extra few editions in Amazon that neither LibraryThing or WorldCat list. On the other hand, the canonical Dune example shows better data in WorldCat and LT than Amazon.
I am currently working on a possible service that will use Amazon’s data as an supplement to xISBN and thingISBN. Because of legal issues, this will not be a web service but rather code that can be used with people’s own AWS accounts. The service is tentatively called amazingISBN. I am also releasing an experimental web tool that retrieves data from all three services side by side right here. Due to use restrictions and underlying legal issues, this service is highly experimental.
Digg This Share This PostMy Intervew with DigitalCampus.tv Podcast
March 3, 2008 – 1:10 amDan Cohen recently interviewed me for his podcast at DigitalCampus.tv. You can listed to it here, my interview starts at about 20 minutes in. It is focused mainly on PublicDomainReprints.org and surrounding issues.
Digg This Share This PostPubMatic is Bad, Really BAD
March 2, 2008 – 4:41 pmAbout four months ago, I decided to try out a new ad optimization service called PubMatic.com. What this service is supposed to do is automatically optimize ads coming from multiple ad networks and display the one that will make the most money. I set it up in the beginning of November 2007 and let it run.
For the next four months or so I was too tied up to check my AdSense stats. Being that this blog is not exactly a money making machine and the Google Ads are supplementing my hosting costs, I usually don’t follow them closely. Nevertheless, when I logged in earlier today to look at an unrelated change, guess what I saw? Being that PubMatic’s service is so great, one would think that I made at least as much money as before? I made nothing - zero - zilch. For some reason not a single Google AdSense ad served via PubMatic on my site was ever recorded. No clicks, no money, nothing. Even the PubMatic reports show no ads served for Google, just Yahoo ads.
I dropped them like a hot cake. All ads have been switched to direct Google AdSense. Needless to say it isn’t a very happy thing. For a startup trying to gain customers, the least they could have done is check that the ads were serving ok.
Caveat Emptor.
Digg This Share This PostBusiness Checks on the Cheap
February 28, 2008 – 11:31 pmUntil recently, I didn’t need physical checks for my publishing business - instead I just used the free BillPay service provided by the bank and saved on postage. However, a few days ago I found out about the new business checking account from ING Direct that has even higher interest than consumer accounts (3.75% at time of writing). The problem is that they need a copy of a check as part of the account opening process. So, I had to order checks.
For some reason, check companies really rip you off. For example, a single box of personal checks with a simple design costs $8.99 at ChecksInTheMail.com, a little over $10/box at ChecksUnlimited, and $24/box at ClarkeAmerican which my bank uses. Business checks are almost double the price.
How much did I pay? Only $5.96/box. How? I ordered them from Walmart Checks. I ordered my personal checks from them about a year ago, and now ordered my business checks as well (I ordered my business checks through the personal checks section since I wanted a regular check book and my address for personal and business uses match). I got them a week later.
And guess what? After looking at the mail headers of the emails I got from Walmart Checks, they actually come from citm.com which is ChecksInTheMail.com. The same site that charges me almost $3 more for the same box of checks.
Installing Subversion and Trac on 1and1 Shared Hosting
February 28, 2008 – 12:05 amInstalling server side components on shared hosting is always a challenge. In the last few weeks as I have begun to undertake more web based consulting assignments, I have found myself facing the need for source code management as well as project management. At my old startup, we use Subversion in combination with Trac, and I decided to use this for myself as well. Here is a summary of steps I undertook to install these on 1&1 shared hosting:
1. For Subversion, I basically followed this guide written by Joe Maller. For 1&1 there are two important notes - you need to specify –without-neon in order to avoid errors with svn/python bindings, AND if you install everything into a single directory, it helps too.
2. For Trac, you need to download and extract Python 2.5. Then you can install it with ./configure –prefix=SOME DIRECTORY. After installation, just update your PATH in bashprofile and relogin. KEEP IN MIND: Subversion will only be available via SSH and only to a single user. 1&1 does not allow multiple SSH users on one account AND they do not have the SSH module installed in Apache. IF you are looking for a multiuser Subversion, try a different provider.
3. After Trac is installed, you can use trac.cgi or trac.fcgi for the web based stuff. HOWEVER, do change the shebang line to point to the Python 2.5 executable in your home directory.
4. SQL Lite does not work on 1&1. Use MySQL instead.
Unfortunately, after spending a sizable amount of my day on this, I realized that I need multiuser access. 1&1 does not allow more than one SSH user so I wasted my time with this. Instead, I just got an account with DreamHost which will host SVN and Trac. While DreamHost isn’t the most reliable host in the world, it looks like I will be using 1&1 for the more reliable stuff like websites and Dreamhost for development.
Digg This Share This PostRegular Updates for PublicDomainReprints.org
February 21, 2008 – 10:41 pmSome regular updates:
1. Amazon EC2 image upgraded to Fedore Core 8.
2. The processed book lists and the 20 recent book RSS feed now point to a book details page.
3. Support for additional link formats has been added.
4. Store links have been removed since they were confusing people.
5. We are still trying to get in touch with the Universal Library folks.
We are almost at 400 books requested level. Thanks to everyone who spread the word of our project.
Digg This Share This PostUpgrading Fedore Core 6 to Fedora Core 8 on Amazon EC2
February 19, 2008 – 3:13 pmOne of the long overdue tasks that I managed to get done today is to upgrade the Amazon EC2 image used for PublicDomainReprints.org to Fedora Core 8. There were three small issues that I ran into and I am posting my solutions to them in hopes that they will help someone else.
1. When running yum update the following error comes up:
Missing Dependency: /usr/bin/rebuild-security-providers is needed by package java-1.5.0-gcj
To resolve the issue, you need to download the following three packages manually from the FC8 repository and install them in the order that I specified. Install as follows
First java_cup by itself, then sinjdoc and java-1.5.0-gcj together
2. The second problem comes up when trying to bundle the image using ec2-bundle-vol. The following error comes up:
rsync: failed to set times on …
This error is described in detail in this post. The solution tha I picked was similar to one of the ones described in the post - I downloaded the RPM package for rsync from FC6 and manually replaced the executable in /usr/bin.
3. The third problem happens when creating an image manifest in ec2-bundle-vol. The error is as follows:
/usr/lib/ruby/1.8/rexml/text.rb:292: in normalize: private method gsub called for 43:Fixnum (NoMethodError)
The solution is to get the latest document.rb and text.rb files from the SVN repository here and replace the old ones in /usr/lib/ruby/1.8/rexml/. Or you can do a diff as described in this post.
UPDATE: A fourth issue came up as well - Ghostscript no longer has JPEG2000 support and as the result losts of “JPXDecode” errors occur when processing. The reason for that is due to the Fedora Core 8 Ghostscript package no longer being compiled with JPEG2000 support as of version 8.61-6.fc.8. The solution is to downgrade to 8.61-5 which can be found here. I contacted the packager and will post a reply.
UPDATE #2: The ghostscript problem has been filed as bug # 433897 with Redhat. I also emailed the packager but received no reply.
Digg This Share This PostSimple Solution for Amazon’s Web Services Reliability
February 16, 2008 – 11:25 pmThis past Friday a major 2 1/2 hour outage of Amazon web services hit the Internet. The blogs were abuzz with the gory details but the resounding scheme has been that it is not reliable enough yet. Being an Amazon AWS user as well as many others who visit their forums, it is a well known fact that Amazon’s web services experience issues on a frequent basis (although not as bad as Friday). This is why for example, my own projects that use Amazon’s web services do so on a asynchronous basis.
In my personal opinion, there is a very simple solution to make their services very reliable and to make people trust them. Just host all of Amazon.com’s images on the web services. Being that Amazon as whole makes billions from their sites versus a paltry 130 million from the web services, that would force their web services team to provide high reliability as well as restore trust in the service.
Just my own two cents.
Digg This Share This PostBug Fixes for PublicDomainReprints.org
February 13, 2008 – 10:40 pmHere are some bug fixes that were implemented on PublicDomainReprints.org:
- Resizing for Internet Archive books did not fully work for books that were too large. This has been fixed.
- Support for illustrations in Google Books has been fixed.
- Support for more Google Books URLs has been added.
- Rudimentary support for extra margins has been added.
- Additional print on demand services are alpha testing.
Opera and Flash under Ubuntu 7.10 (Gutsy)
February 4, 2008 – 9:49 amThis is a short guide for people who want to use Flash with Opera under Ubuntu Linux 7.10 (Gutsy):
1. Enabled the proposed repository in System->Administration->Software Sources.
2. Go to Synaptic and install the latest flash plugin from the proposed repository.
3. Go to Opera’s website and install the 9.50 beta version.
4. Start Opera, go to Tools-> Preferences->Advanced->Downloads. Search for extension “swf”.
5. To the extensions box, type in “swf, flv”.
6. Find the entry for “flv” in the downloads list. If the Totem browser plugin is listed it, remove it.
Digg This Share This PostMinor Updates for PublicDomainReprints.org
January 20, 2008 – 11:37 pmOver the weekend, some minor updates were applied to the PublicDomainReprints.org website and service:
1. A separate RSS feed now exists for updates and blog posts. For now, the blog posts are being pulled out of my personal blog but as the service grows that will probably change.
2. An expanded IPR page now lists terms and conditions of originating archives.
3. The processed books page now has links to Amazon, LibraryThing and others.
4. The FAQ page has been re-organized, categorized and revamped.
5. The underlying name for the project is now “Public Domain Archive and Reprints Service” and has been so registered with the State of Maryland. This is done to reflect and expanding role of this project.
More to come soon. Comments are welcome either on this blog or via email to reprints /at/ publicdomainreprints [dot] org.
Digg This Share This PostBlog Coverage of PublicDomainReprints.org
January 14, 2008 – 12:56 amThe last week has been a whirlwind in terms of traffic - over 7,000 hits resulting from various blog posts from around the web as well as various bookmarking services. I want to thank all of people who helped publicize this little project and want to mention below some of the most significant posts:
- Open Access News post by Peter Suber (Nov 22nd, 2007)
- Teleread post by David Rothman (Nov 22nd, 2007)
- Google Blogoscoped post by Phillip Lenssen (Jan 2nd, 2007)
- Blog post by Brad DeLong (Jan 5th, 2007)
- Metafilter post by Stephen Balbach (Jan 10th, 2007)
- Lifehacker post by Kevin Purdy (Jan 11th, 2007) [from a post in RedFerret Journal]
Please let me know if I missed anyone else. More updates on the service itself to come soon.
Digg This Share This PostPublicDomainReprints.org and Google Books Support
December 31, 2007 – 4:34 pmMy little experiment for reprinting public domain books has been running for a little over a month. About 75 books have been setup and printed so far. Today I have added support for public domain books from Google Book Search and moved the service to its own domain. It will now live at PublicDomainReprints.org.
In the works: support for Universal Library, a new cheaper printing service and possibly more trim sizes/binding options. Stay tuned.
Digg This Share This Post

