Google
 

Archive for the ‘Programming’ Category

Cleaning Up Bad HTML in Perl

Friday, October 24th, 2008

Here is a short way to cleanup bad HTML input and convert to XML with Perl: use HTML::TreeBuilder; use XML::LibXML; $html_code = ''; my $builder = HTML::TreeBuilder->new(); $xml_source = $builder->parse($html_code); $xml_source->elementify(); $xml_source1 = $xml_source->as_XML(); my $parser = XML::LibXML->new(); $parser->recover(1); my $doc = $parser->parse_string($xml_source1); $xml_source2 = $doc->toString();

Using XSLT for Very Large Files

Monday, October 20th, 2008

While I was working recently on one of my projects, I noticed a curious problem. The server I was using was running out of memory while doing a simple XSLT transform. That was sort of strange because the XSLT transform in question was rather simple and the amount of memory ...

JSON Without Callbacks

Monday, October 20th, 2008

During my investigations into Google Reader and iGoogle, I ran into an issue which has not been clearly addressed anywhere. The question is if a site provides a JSON feed without a callback function and you are using it on a different domain (meaning you cannot use XmlHttpRequest), can you ...

Unofficial Google Reader gadget for iGoogle

Sunday, October 19th, 2008

For the past few weeks, my RSS reader (Bloglines) has not been behaving. Now comes a post on Techncrunch that the founder of Bloglines is considering switching to Google Reader. I started exploring Google Reader to see if it would fit my needs (notice the new feed on the sidebar). ...

Installing Subversion and Trac on 1and1 Shared Hosting

Thursday, February 28th, 2008

Installing server side components on shared hosting is always a challenge. In the last few weeks as I have begun to undertake more web based consulting assignments, I have found myself facing the need for source code management as well as project management. At my old startup, we use Subversion ...

Why the Left() Function Stops Working in VBA

Thursday, November 22nd, 2007

This is an interesting problem that my wife had at work recently. In a VBA-based program, the Left function suddenly stopped working with an error along the lines of "type data mismatch". Being that this is a native function to VBA, my first thoughts were that it was caused by ...

Flattening Transparencies in PDF with Free Tools

Friday, November 9th, 2007

An interesting issue has come up recently with my publishing company - one of our printing suppliers flagged incoming PDFs as being not-printable due to transparencies. After looking around for solutions, I came up with a way to resolve the issue without resorting to Acrobat (which we don't use). The solution is two fold: 1. First convert the incoming PDF to PostScript using XPDF's pdftops. This will flatten the transparencies. GhostScript's pdf2ps tool DOES NOT do that. 2. Then convert the PostScript back to PDF using GhostScript's ps2pdf tool. Both tools are open source and free (although watch out for GhostScript's GPL license). One important point - pdftops requires a paper width and ...

Another Book Search Engine Experiment

Monday, July 30th, 2007

About two years ago I coded a small experimental search engine for books which used Ajax and Amazon web services. Recently, I went back to the same concept and put up a new experiment - a meta search engine for book information that aggregates book data from about 60 different ...

Converting from DJVU to PDF

Wednesday, July 25th, 2007

One of the more mundane tasks that faces every publishing business like mine is data conversion. Recently, I have been involved in a major project which seeks to make available several hundred titles in print on demand format. Unfortunatly, the library that scanned these titles did not use PDF ...

An Ajax Search Engine Without Servers (Almost)

Tuesday, July 10th, 2007

For a while I have been working on a hobby project trying to make a meta-search engine that you can use to search multiple search engines by tag. The catch? No server side components. This search engine works client side only from the user's browser by using RSS feeds from ...