<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Personal Website of Yakov Shafranovich &#187; Programming</title>
	<atom:link href="http://www.shaftek.org/blog/category/programming/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.shaftek.org</link>
	<description>ShafTek.org = SHAFranovich TECHnologies</description>
	<lastBuildDate>Thu, 02 Feb 2012 02:24:44 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Enabling Other Languages on Amazon&#8217;s New Kindle Fire tablet</title>
		<link>http://www.shaftek.org/blog/2011/12/03/enabling-other-languages-on-amazons-new-kindle-fire-tablet/</link>
		<comments>http://www.shaftek.org/blog/2011/12/03/enabling-other-languages-on-amazons-new-kindle-fire-tablet/#comments</comments>
		<pubDate>Sun, 04 Dec 2011 04:40:44 +0000</pubDate>
		<dc:creator>Yakov Shafranovich</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Projects]]></category>

		<guid isPermaLink="false">http://www.shaftek.org/?p=1051</guid>
		<description><![CDATA[IMPORTANT: The source code and all future development of this application is now moved to GitHub. Please use that page in the future: https://github.com/shaftekbiz/android-language-settings-app &#8212;&#8212;&#8212;&#8212;&#8212;- One of the interesting aspects of the new Kindle Fire is how much Amazon had customized or simply overrode the default UI, including some of the settings pages. One issue [...]]]></description>
			<content:encoded><![CDATA[<p><strong>IMPORTANT: The source code and all future development of this application is now moved to GitHub. Please use that page in the future:</strong></p>
<p><strong><a href="https://github.com/shaftekbiz/android-language-settings-app">https://github.com/shaftekbiz/android-language-settings-app</a></strong></p>
<p>&#8212;&#8212;&#8212;&#8212;&#8212;-</p>
<p>One of the interesting aspects of the new Kindle Fire is how much Amazon had customized or simply overrode the default UI, including some of the settings pages. One issue that has come up recently is how to enable ability to use languages other than English. A fellow Kindle Fire in Germany, named <strong>Gero Zahn</strong>, managed to figure out how to do that by using two separate apps which trick the Kindle Fire to opening the input language settings UI page which while hidden still remains on the device. His approach is described in detail in the following blog post and the credit for discovering this goes to him:</p>
<p><a href="http://blog.gerozahn.de/2011/11/kindle-fire-keyboard-layouts-solved/">http://blog.gerozahn.de/2011/11/kindle-fire-keyboard-layouts-solved/</a></p>
<p>I was looking for a simpler way to do this and came up with a very simple Android app using <a href="http://www.appinventorbeta.com/">Google&#8217;s AppInventor</a> that does just that &#8211; gives you access to the input language settings. This approach does not require installing outside applications other than this app itself. If you already have Android Market sideloaded into the Kindle, you can find the app <a href="https://market.android.com/details?id=appinventor.ai_yakov.LanguageSettings">here</a>.</p>
<p><strong>If you do not have the Market installed, you can download the app here:</strong></p>
<p><strong><a href="http://goo.gl/NfEqO">http://goo.gl/NfEqO</a></strong></p>
<p><strong>This app is also available on the Amazon&#8217;s AppStore but it has not yet been approved for the Kindle Fire. To see this app on Amazon.com, <a href="http://www.amazon.com/dp/B0071LQXCK">click here</a>.</strong></p>
<p>Make sure to enable ability to load outside apps into your Kindle by tapping the top right corner of the screen to enter the settings section, then click on &#8220;Device&#8221; and check off &#8220;Allow installation&#8221; for non approved applications. I have submitted the app to the Amazon AppStore for approval as well, and hopefully it will be available directly from it.</p>
<p>For the technically inclined, here is a short explanation of what is happening:</p>
<p>Actions on Android that cross application boundaries are triggering using something called &#8220;<a href="http://developer.android.com/guide/topics/intents/intents-filters.html">Intents</a>&#8220;. There are two of those that trigger language settings, of which the first (<strong>com.android.settings.LanguageSettings</strong>) has been customized by Amazon to show their own keyboard options. The second is the one that actually triggers the language selection menu and is called &#8220;<strong>com.android.inputmethod.latin.InputLanguageSelection</strong>&#8220;. The action for it is called &#8220;<strong>android.intent.action.VIEW</strong>&#8220;.</p>
<p><strong>UPDATE #1:</strong></p>
<p>In my testing, only the following languages work:</p>
<ul>
<li>Danish</li>
<li>English UK</li>
<li>English US</li>
<li>French</li>
<li>German</li>
<li>Hebrew</li>
<li>Norwegian</li>
<li>Russian</li>
<li>Serbian</li>
<li>Swedish</li>
</ul>
<p><strong>UPDATE #2 &#8211; January 22nd, 2012</strong></p>
<p>The app has been updated to v1.3 and it is now a fully native Android app and no longer uses App Inventor. The size has been brought down from 1.4 MBs to 22 KBs. I also added support for Android 4 (ICS). The code will be open sourced shortly. You can use Google Market to update, or the same link above.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.shaftek.org/blog/2011/12/03/enabling-other-languages-on-amazons-new-kindle-fire-tablet/feed/</wfw:commentRss>
		<slash:comments>41</slash:comments>
		</item>
		<item>
		<title>Installing Eclipse Visual Editor 1.5 on Ubuntu 10.10</title>
		<link>http://www.shaftek.org/blog/2011/03/30/installing-eclipse-visual-editor-1-5-on-ubuntu-10-10/</link>
		<comments>http://www.shaftek.org/blog/2011/03/30/installing-eclipse-visual-editor-1-5-on-ubuntu-10-10/#comments</comments>
		<pubDate>Thu, 31 Mar 2011 03:44:43 +0000</pubDate>
		<dc:creator>Yakov Shafranovich</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[eclipse]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[ubuntu]]></category>
		<category><![CDATA[visual editor]]></category>

		<guid isPermaLink="false">http://www.shaftek.org/?p=902</guid>
		<description><![CDATA[I had been recently kicking around some coding ideas but until now all of my hobby coding has been done in Perl. I wanted to try something browser-based which led me back to Java, so of course I installed Eclipse as my IDE. However, I ran into an issue when I tried to install Visual [...]]]></description>
			<content:encoded><![CDATA[<p>I had been recently kicking around some coding ideas but until now all of my hobby coding has been done in Perl. I wanted to try something browser-based which led me back to Java, so of course I installed <a href="http://www.eclipse.org/">Eclipse</a> as my IDE. However, I ran into an issue when I tried to install <a href="http://www.eclipse.org/vep/">Visual Editor</a>, a plugin used for developing GUI applications. However, I was not able to install it. After some extensive Googling, and trial and error, I finally came up with a solution which I am sharing below.</p>
<p>First of all, about versioning: there are two current versions of Eclipse &#8211; 3.5 aka Ganymede and 3.6 aka Helios. Ubuntu 10.10 which I use on home box comes with the 3.5 version. Visual Editor also has two versions, 1.4 dating from 2009 and a more recent 1.5 release from late 2010. Because I wanted the 1.5 version with Eclipse 3.6, and after reading multiple posts describing issues with the Ubuntu packaged Eclipse, I installed the default packages and installed Eclipse 3.6 manually as follows:</p>
<ol>
<li>Download the Eclipse IDE for Java developers, 32 bit or 64 bit Linux version from <a href="http://www.eclipse.org/downloads/">here</a>.</li>
<li>I setup up Eclipse on my machine in the <strong>/opt/eclipse</strong> folder as root, however you can also do it on a per user basis in the home directory <a href="https://help.ubuntu.com/community/EclipseIDE#User installation">as described here</a>.</li>
<li>Create a shortcut to the Eclipse application itself and put it anywhere you want (I stuck mine directly on a panel).</li>
</ol>
<p>To install Visual Editor 1.5:</p>
<ol>
<li>Click on &#8220;Help&#8221;, &#8220;Install New Software&#8221;, &#8220;Add&#8221; to add the Visual Editor repository. Use the follow URL: <strong>http://download.eclipse.org/tools/ve/updates/1.5.0/</strong></li>
<li>When you try to install VE, you will get an error about a package called <strong>org.eclipse.jem.</strong> This package is part of Eclipse&#8217;s GEF tools, so go ahead and add a repository for GEF wit hthe following URL: <strong>http://download.eclipse.org/tools/gef/updates/releases/</strong></li>
<li>Install the GEF SDK itself or Draw2D.</li>
<li>Go back and install Visual Editor, then restart Eclipse.</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://www.shaftek.org/blog/2011/03/30/installing-eclipse-visual-editor-1-5-on-ubuntu-10-10/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Comodo SSL Breach and Mobile Devices</title>
		<link>http://www.shaftek.org/blog/2011/03/25/comodo-ssl-breach-and-mobile-devices/</link>
		<comments>http://www.shaftek.org/blog/2011/03/25/comodo-ssl-breach-and-mobile-devices/#comments</comments>
		<pubDate>Fri, 25 Mar 2011 13:20:48 +0000</pubDate>
		<dc:creator>Yakov Shafranovich</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://www.shaftek.org/?p=900</guid>
		<description><![CDATA[A recent breach at a SSL Certificate Authority (Comodo) had nine fake SSL certificate issued as a result for sites like Gmail, Yahoo, etc. [details here at the EFF]. While desktop browsers issued updates, the overlooked issue here is mobile. Browsers on mobile devices are usually in firmware, and issuing firmware updates is not trivial. [...]]]></description>
			<content:encoded><![CDATA[<p>A recent breach at a SSL Certificate Authority (Comodo) had nine fake SSL certificate issued as a result for sites like Gmail, Yahoo, etc. [details <a href="http://www.eff.org/deeplinks/2011/03/iranian-hackers-obtain-fraudulent-https">here at the EFF</a>]. While desktop browsers issued updates, the overlooked issue here is mobile. Browsers on mobile devices are usually in firmware, and issuing firmware updates is not trivial. That means that currently most mobile devices are vulnerable to this fake SSL mess &#8211; something that no one has mentioned.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.shaftek.org/blog/2011/03/25/comodo-ssl-breach-and-mobile-devices/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Manipulating Files in the Cloud</title>
		<link>http://www.shaftek.org/blog/2009/05/14/manipulating-files-in-the-cloud/</link>
		<comments>http://www.shaftek.org/blog/2009/05/14/manipulating-files-in-the-cloud/#comments</comments>
		<pubDate>Fri, 15 May 2009 03:54:50 +0000</pubDate>
		<dc:creator>Yakov Shafranovich</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[amazon ec2]]></category>
		<category><![CDATA[ec2]]></category>

		<guid isPermaLink="false">http://www.shaftek.org/?p=691</guid>
		<description><![CDATA[A few days ago I got to the see the power of cloud computing up close and personal. Someone had a large amount of files already stored in Amazon S3 which needed to be combined with another large set of files. The problem was that my desktop could do it but it would take forever [...]]]></description>
			<content:encoded><![CDATA[<p>A few days ago I got to the see the power of cloud computing up close and personal. Someone had a large amount of files already stored in Amazon S3 which needed to be combined with another large set of files. The problem was that my desktop could do it but it would take forever to download and upload. The solution: one EC2 instance of stock Ubuntu 9.04 running for about 15-20 minutes. Download the files with s3cmd, rename using shell scripts, combine and reupload.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.shaftek.org/blog/2009/05/14/manipulating-files-in-the-cloud/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Turning Playlist.com into Podcasts and Playing Them on Cell Phones</title>
		<link>http://www.shaftek.org/blog/2009/05/10/turning-playlistcom-into-podcasts-and-playing-them-on-cell-phones/</link>
		<comments>http://www.shaftek.org/blog/2009/05/10/turning-playlistcom-into-podcasts-and-playing-them-on-cell-phones/#comments</comments>
		<pubDate>Mon, 11 May 2009 02:00:53 +0000</pubDate>
		<dc:creator>Yakov Shafranovich</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[playlist.com]]></category>
		<category><![CDATA[podcasting]]></category>
		<category><![CDATA[rss]]></category>

		<guid isPermaLink="false">http://www.shaftek.org/?p=684</guid>
		<description><![CDATA[PlayList.com is a website that allows anyone to put together a playlist of their music and then share it via their Facebook/Myspace/etc page via an embedded flash Playlist. A nifty hack allows you to play a PlayList.com as a podcast or even via a cell phone. 1. Transform PlayList.com&#8217;s ASX playlist into Podcast RSS via [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: left;"><a href="http://www.playlist.com">PlayList.com</a> is a website that allows anyone to put together a playlist of their music and then share it via their Facebook/Myspace/etc page via an embedded flash Playlist. A nifty hack allows you to play a PlayList.com as a podcast or even via a cell phone.</p>
<p style="text-align: left;">1. Transform PlayList.com&#8217;s ASX playlist into Podcast RSS via this link:</p>
<p style="text-align: left;">http://www.w3.org/2005/08/online_xslt/xslt?xslfile=http://www.shaftek.org/code/asx2rss/asx2rss.xsl&amp;xmlfile=http://www.playlist.com/playlist/XXXXXXXX/asx</p>
<p style="text-align: left;">Where XXXXXXX is the Playlist.com user ID.</p>
<p style="text-align: left;">2. Use the resulting postcast with your podcatcher, or mobile phone. Or even cooler, use <a href="http://www.podlinez.com">PodLinez.com</a> or <a href="http://foneshow.com/">FoneShow</a> to listen over a regular phone.</p>
<p style="text-align: left;"><strong>DISCLAIMER:</strong> I am not responsible for any illegal behavior that results. You should keep in mind that PlayList.com may be indexing illegal music which you will end up downloading via the podcast RSS. It is your responsiblity to check.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.shaftek.org/blog/2009/05/10/turning-playlistcom-into-podcasts-and-playing-them-on-cell-phones/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Deleting Amazon S3 Bucket with A Lot of Files</title>
		<link>http://www.shaftek.org/blog/2009/05/06/deleting-amazon-s3-bucket-with-a-lot-of-files/</link>
		<comments>http://www.shaftek.org/blog/2009/05/06/deleting-amazon-s3-bucket-with-a-lot-of-files/#comments</comments>
		<pubDate>Thu, 07 May 2009 02:28:38 +0000</pubDate>
		<dc:creator>Yakov Shafranovich</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.shaftek.org/?p=680</guid>
		<description><![CDATA[Here is a short script that can mass delete files in an Amazon S3 bucket. It is limited to a 1,000 keys at a time: #!/usr/bin/perl use Net::Amazon::S3; my $s3 = Net::Amazon::S3-&#62;new( {   aws_access_key_id     =&#62; 'ACCESS_ID', aws_secret_access_key =&#62; 'ACCESS_KEY', retry                 =&#62; 1, } ); my $bucket = $s3-&#62;bucket("BUCKET") or die $s3-&#62;err . ": " . [...]]]></description>
			<content:encoded><![CDATA[<p>Here is a short script that can mass delete files in an Amazon S3 bucket. It is limited to a 1,000 keys at a time:</p>
<pre>#!/usr/bin/perl

use Net::Amazon::S3;

my $s3 = Net::Amazon::S3-&gt;new(
{   aws_access_key_id     =&gt; 'ACCESS_ID',
aws_secret_access_key =&gt; 'ACCESS_KEY',
retry                 =&gt; 1,
}
);

my $bucket = $s3-&gt;bucket("BUCKET") or die $s3-&gt;err . ": " . $s3-&gt;errstr;
my $response = $bucket-&gt;list or die $s3-&gt;err . ": " . $s3-&gt;errstr;

foreach my $key (@{$response-&gt;{keys}}) {
my $key_name = $key-&gt;{key};
print "Deleting '$key_name'\n";
$bucket-&gt;delete_key($key_name) or die $s3-&gt;err . ": " . $s3-&gt;errstr;
}
exit;</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.shaftek.org/blog/2009/05/06/deleting-amazon-s3-bucket-with-a-lot-of-files/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cleaning Up Bad HTML in Perl, Take 2</title>
		<link>http://www.shaftek.org/blog/2009/02/09/cleaning-up-bad-html-in-perl-take-2/</link>
		<comments>http://www.shaftek.org/blog/2009/02/09/cleaning-up-bad-html-in-perl-take-2/#comments</comments>
		<pubDate>Tue, 10 Feb 2009 04:10:27 +0000</pubDate>
		<dc:creator>Yakov Shafranovich</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.shaftek.org/?p=607</guid>
		<description><![CDATA[(A followup on an earlier post) Here is another way to cleanup bad HTML with Perl, and convert to XML: use HTML::DOMbo; use HTML::TreeBuilder; use XML::LibXML; $html_code = ''; // Parse HTML my $builder = HTML::TreeBuilder-&#62;new(); $xml_source = $builder-&#62;parse($html_code); // Convert to XML DOM $xml_source1 = $xml_source-&#62;to_XML_DOM; // Extract XML and encode UTF-8 $xml_source2 = [...]]]></description>
			<content:encoded><![CDATA[<p>(A followup on <a href="http://www.shaftek.org/blog/2008/10/24/cleaning-up-bad-html-in-perl/">an earlier post</a>)</p>
<p>Here is another way to cleanup bad HTML with Perl, and convert to XML:</p>
<pre>use HTML::DOMbo;
use HTML::TreeBuilder;
use XML::LibXML;

$html_code = '';

// Parse HTML
my $builder = HTML::TreeBuilder-&gt;new();
$xml_source = $builder-&gt;parse($html_code);

// Convert to XML DOM
$xml_source1 = $xml_source-&gt;to_XML_DOM;

// Extract XML and encode UTF-8
$xml_source2 = (encode("utf-8", $xml_source1);</pre>
<p>This approach relies on the <strong><a href="http://search.cpan.org/dist/HTML-DOMbo/">HTML::DOMbo</a></strong> module to do the actual conversion between HTML and XML, and <a href="http://search.cpan.org/~petek/HTML-Tree/lib/HTML/TreeBuilder.pm"><strong>HTML::TreeBuilder</strong></a> for parsing.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.shaftek.org/blog/2009/02/09/cleaning-up-bad-html-in-perl-take-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Handling Unicode Data in Amazon S3 Headers</title>
		<link>http://www.shaftek.org/blog/2008/12/28/handling-unicode-data-in-amazon-s3-headers/</link>
		<comments>http://www.shaftek.org/blog/2008/12/28/handling-unicode-data-in-amazon-s3-headers/#comments</comments>
		<pubDate>Sun, 28 Dec 2008 22:38:52 +0000</pubDate>
		<dc:creator>Yakov Shafranovich</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[amazon s3]]></category>
		<category><![CDATA[aws]]></category>
		<category><![CDATA[s3]]></category>

		<guid isPermaLink="false">http://www.shaftek.org/?p=582</guid>
		<description><![CDATA[During a recent project, I ran into an issue when handling Unicode data in metadata headers in Amazon S3. Apparently, Amazon adds on &#8220;?UTF-8?B?&#8221; in front of any Unicode data and &#8220;?=&#8221; in end of the data. I could not find any existing standard that describes this or why it is done, but I surmise [...]]]></description>
			<content:encoded><![CDATA[<p>During a recent project, I ran into an issue when handling Unicode data in metadata headers in <a href="http://aws.amazon.com/s3/">Amazon S3</a>. Apparently, Amazon adds on <strong>&#8220;?UTF-8?B?&#8221;</strong> in front of any Unicode data and <strong>&#8220;?=&#8221;</strong> in end of the data. I could not find any existing standard that describes this or why it is done, but I surmise this probably has to do with Base-64 encoding and how it handles Unicode.</p>
<p>An easy Perl hack to get around this is as following (assuming you are using <a href="http://search.cpan.org/dist/MIME-Base64/">MIME::Base64</a> module):</p>
<pre>if($var =~ m/^\=\?UTF-8\?B\?(.*)\?=/) {
    $results = decode_base64($var);
}</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.shaftek.org/blog/2008/12/28/handling-unicode-data-in-amazon-s3-headers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>QuickBase and Unicode Support</title>
		<link>http://www.shaftek.org/blog/2008/10/27/quickbase-and-unicode-support/</link>
		<comments>http://www.shaftek.org/blog/2008/10/27/quickbase-and-unicode-support/#comments</comments>
		<pubDate>Mon, 27 Oct 2008 20:14:57 +0000</pubDate>
		<dc:creator>Yakov Shafranovich</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[intuit]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[quickbase]]></category>
		<category><![CDATA[unicode]]></category>

		<guid isPermaLink="false">http://www.shaftek.org/?p=554</guid>
		<description><![CDATA[Some quick notes on QuickBase and Unicode: QuickBase stores Unicode data natively on the backend Unicode encoding must be set as default in the browser Any QuickBase functionality that relies on Javascript or AJAX support, DOES NOT work with Unicode The last point is due to the two issues: 1. The bug with UTF-8 encoding [...]]]></description>
			<content:encoded><![CDATA[<p>Some quick notes on <a href="http://www.quickbase.com">QuickBase</a> and Unicode:</p>
<ul>
<li>QuickBase stores Unicode data natively on the backend</li>
<li>Unicode encoding must be set as default in the browser</li>
<li>Any QuickBase functionality that relies on Javascript or AJAX support, DOES NOT work with Unicode</li>
</ul>
<p>The last point is due to the two issues:</p>
<p>1. The bug with UTF-8 encoding in <a href="http://www.shaftek.org/blog/2008/10/27/xmlhttprequest-unicode-and-firefox/">my previous post</a>.</p>
<p>2. The fact the UTF-8 is not decoded properly as described <a href="http://ecmanaut.blogspot.com/2006/07/encoding-decoding-utf8-in-javascript.html">here</a>.</p>
<p>Same applies to the <a href="https://www.quickbase.com/db/6mztyxu8?a=dr&amp;r=c8">QuickBase Javascript client library</a>. Of course, adding cross browser XML support and proper decoding would fix it.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.shaftek.org/blog/2008/10/27/quickbase-and-unicode-support/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Fixing &#8220;Input is not proper UTF-8, indicate encoding&#8221; Error</title>
		<link>http://www.shaftek.org/blog/2008/10/26/fixing-input-is-not-proper-utf-8-indicate-encoding-error/</link>
		<comments>http://www.shaftek.org/blog/2008/10/26/fixing-input-is-not-proper-utf-8-indicate-encoding-error/#comments</comments>
		<pubDate>Sun, 26 Oct 2008 16:21:11 +0000</pubDate>
		<dc:creator>Yakov Shafranovich</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.shaftek.org/?p=548</guid>
		<description><![CDATA[Quick way to fix the following error in Perl: :1: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xA0 0x20 0xA0 0x3C Use this command: use Encode: $string1 = decode("UTF-8", $input);]]></description>
			<content:encoded><![CDATA[<p>Quick way to fix the following error in Perl:</p>
<blockquote>
<pre>:1: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0xA0 0x20 0xA0 0x3C</pre>
</blockquote>
<p>Use this command:</p>
<blockquote>
<pre>use Encode:
$string1 = decode("UTF-8", $input);</pre>
</blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.shaftek.org/blog/2008/10/26/fixing-input-is-not-proper-utf-8-indicate-encoding-error/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Cleaning Up Bad HTML in Perl</title>
		<link>http://www.shaftek.org/blog/2008/10/24/cleaning-up-bad-html-in-perl/</link>
		<comments>http://www.shaftek.org/blog/2008/10/24/cleaning-up-bad-html-in-perl/#comments</comments>
		<pubDate>Fri, 24 Oct 2008 14:18:34 +0000</pubDate>
		<dc:creator>Yakov Shafranovich</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[html]]></category>
		<category><![CDATA[perl]]></category>
		<category><![CDATA[xml]]></category>

		<guid isPermaLink="false">http://www.shaftek.org/?p=546</guid>
		<description><![CDATA[Here is a short way to cleanup bad HTML input and convert to XML with Perl: use HTML::TreeBuilder; use XML::LibXML; $html_code = ''; my $builder = HTML::TreeBuilder-&#62;new(); $xml_source = $builder-&#62;parse($html_code); $xml_source-&#62;elementify(); $xml_source1 = $xml_source-&#62;as_XML(); my $parser = XML::LibXML-&#62;new(); $parser-&#62;recover(1); my $doc = $parser-&#62;parse_string($xml_source1); $xml_source2 = $doc-&#62;toString();]]></description>
			<content:encoded><![CDATA[<p>Here is a short way to cleanup bad HTML input and convert to XML with Perl:</p>
<pre>use HTML::TreeBuilder;
use XML::LibXML;

$html_code = '';

my $builder = HTML::TreeBuilder-&gt;new();
$xml_source = $builder-&gt;parse($html_code);
$xml_source-&gt;elementify();
$xml_source1 = $xml_source-&gt;as_XML();

my $parser = XML::LibXML-&gt;new();
$parser-&gt;recover(1);
my $doc = $parser-&gt;parse_string($xml_source1);
$xml_source2 = $doc-&gt;toString();</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.shaftek.org/blog/2008/10/24/cleaning-up-bad-html-in-perl/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Using XSLT for Very Large Files</title>
		<link>http://www.shaftek.org/blog/2008/10/20/using-xslt-for-very-large-files/</link>
		<comments>http://www.shaftek.org/blog/2008/10/20/using-xslt-for-very-large-files/#comments</comments>
		<pubDate>Mon, 20 Oct 2008 13:32:01 +0000</pubDate>
		<dc:creator>Yakov Shafranovich</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[sax]]></category>
		<category><![CDATA[xslt]]></category>

		<guid isPermaLink="false">http://www.shaftek.org/?p=540</guid>
		<description><![CDATA[While I was working recently on one of my projects, I noticed a curious problem. The server I was using was running out of memory while doing a simple XSLT transform. That was sort of strange because the XSLT transform in question was rather simple and the amount of memory on the server was very [...]]]></description>
			<content:encoded><![CDATA[<p>While I was working recently on <a href="http://www.publicdomainreprints.org/">one of my projects</a>, I noticed a curious problem. The server I was using was running out of memory while doing a simple XSLT transform. That was sort of strange because the XSLT transform in question was rather simple and the amount of memory on the server was very big (an EC2 instance). After further investigation, it turned out that the issue was due to the large size of the input XML (over 300 MBs) which was clogging up the memory. It seems that most XSLT processors, including <a href="http://xmlsoft.org/XSLT/">libXSLT</a> which I was using, load the input XML into memory completely before doing the transform. A better alternative is to use a process similar to SAX where the input XML is loaded and transformed incrementally &#8211; something that is called &#8220;streaming&#8221;. There are several solutions:</p>
<p>1. Saxon XSLT processor <a href="http://saxon.sourceforge.net/">supports &#8220;streaming mode&#8221;</a> which allows processing of files upto 20 Gbs. BUT this feature is only available in <a href="http://www.saxonica.com/documentation/changes/intro/highlights91.html">the commercial version</a>.</p>
<p>2. An alternative to XSLT is something called <a href="http://stx.sourceforge.net/">STX or &#8220;Streaming Transformations for XML&#8221;</a>, which is specifically designed to address this issue. HOWEVER, it is not a standard of any sort like XSLT and there are only two implementations.</p>
<p>3. There is a streaming XSLT processor released by a team at a national laboratory but I currently misplaced the link.</p>
<p>4. Apache Xalan XSLT Processor in <a href="http://xml.apache.org/xalan-j/dtm.html#incremental">incremental mode</a> (note this is NOT true streaming since the entire original file is eventually loaded into memory).</p>
<p>For my project I choose #4 &#8211; Apache Xalan because (a) I wanted an open source solution and (b) I wanted to stick to the XSLT standard as opposed to STX. I might look into STX in the future to reduce the original XML file in size, and then further process it using standard XSLT tools.</p>
<p>Now the good news is that the next step in the XSLT standardization process at W3C is streaming with something called XSLT2++. Take a look at <a href="http://news.oreilly.com/2008/09/xslt-20-streaming-xml-transfor.html">this O&#8217;Reilly news article</a> and <a href="http://saxonica.blogharbor.com/blog/_archives/2008/9/1/3863838.html">a blog post from Michael Kay</a> (the editor of the XSLT WG at W3C). The bad news that it will take at least 18 months for the standards process and who knows how long for the actual implementations.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.shaftek.org/blog/2008/10/20/using-xslt-for-very-large-files/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>JSON Without Callbacks</title>
		<link>http://www.shaftek.org/blog/2008/10/20/json-without-callbacks/</link>
		<comments>http://www.shaftek.org/blog/2008/10/20/json-without-callbacks/#comments</comments>
		<pubDate>Mon, 20 Oct 2008 12:02:03 +0000</pubDate>
		<dc:creator>Yakov Shafranovich</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[json]]></category>

		<guid isPermaLink="false">http://www.shaftek.org/?p=538</guid>
		<description><![CDATA[During my investigations into Google Reader and iGoogle, I ran into an issue which has not been clearly addressed anywhere. The question is if a site provides a JSON feed without a callback function and you are using it on a different domain (meaning you cannot use XmlHttpRequest), can you still take advantage of it? [...]]]></description>
			<content:encoded><![CDATA[<p>During <a href="http://www.shaftek.org/blog/2008/10/19/unofficial-google-reader-gadget-for-igoogle/">my investigations</a> into Google Reader and iGoogle, I ran into an issue which has not been clearly addressed anywhere. The question is if a site provides a JSON feed without a callback function and you are using it on a different domain (meaning you cannot use XmlHttpRequest), can you still take advantage of it?</p>
<p>The short answer is no &#8211; JSON does not work in a pure client side environment without callbacks. This is due to the fact that JSON objects come back anonymous and when you include them in your page via SCRIPT tags, they do not initialize to anything. They must be passed to eval or to a function in order to become true Javascript objects. For example, <a href="http://feeds.delicious.com/v2/json">this delicious feed</a> is raw JSON. If you include it via a SCRIPT tag, it will not work.</p>
<p>An alternative is to use a callback, a proxy, something like Google&#8217;s Feeds API, XML HTTP Request if you are on the same domain or Flash if crossdomain.xml is defined properly.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.shaftek.org/blog/2008/10/20/json-without-callbacks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Unofficial Google Reader gadget for iGoogle</title>
		<link>http://www.shaftek.org/blog/2008/10/19/unofficial-google-reader-gadget-for-igoogle/</link>
		<comments>http://www.shaftek.org/blog/2008/10/19/unofficial-google-reader-gadget-for-igoogle/#comments</comments>
		<pubDate>Mon, 20 Oct 2008 00:13:56 +0000</pubDate>
		<dc:creator>Yakov Shafranovich</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[bloglines]]></category>
		<category><![CDATA[google reader]]></category>
		<category><![CDATA[igoogle]]></category>

		<guid isPermaLink="false">http://www.shaftek.org/?p=533</guid>
		<description><![CDATA[For the past few weeks, my RSS reader (Bloglines) has not been behaving. Now comes a post on Techncrunch that the founder of Bloglines is considering switching to Google Reader. I started exploring Google Reader to see if it would fit my needs (notice the new feed on the sidebar). One immideate issue I ran [...]]]></description>
			<content:encoded><![CDATA[<p>For the past few weeks, <a href="http://www.bloglines.com">my RSS reader (Bloglines)</a> has not been behaving. Now comes <a href="http://www.techcrunch.com/2008/10/18/googles-destruction-of-bloglines-now-complete/">a post on Techncrunch</a> that the founder of Bloglines is considering switching to <a href="http://www.google.com/reader">Google Reader</a>. I started exploring Google Reader to see if it would fit my needs (notice the new feed on the sidebar). One immideate issue I ran into is iGoogle support. I got used to <a href="http://code.google.com/p/bloglines-notifier/">this Bloglines notifier gadget for iGoogle </a>that tells me when I have new items. BUT, the official Google Reader gadget <a href="http://groups.google.com/group/google-reader-troubleshoot/browse_thread/thread/3001c6fdc7cc7f68/bf62fa6410ff5274">seems not work under Google Apps</a> (which I use) and the new canvas version <a href="http://groups.google.com/group/google-reader-troubleshoot/browse_thread/thread/5e31cc4f387b5115">fails in Firefox</a>.</p>
<p>After a look at the Gadget API, I whipped up a very small hack for this problem. It is an iGoogle gadget which embeds the mobile version of Google Reader (which fits perfectly inside iGoogle). It is a very simple gadget with <a href="http://code.google.com/apis/gadgets/docs/legacy/fundamentals.html#URL">Content Type = url</a> and is basically just a glorified iFrame. As long as you are logged into Google Reader, it should work. I tested it under Firefox 3 and Opera 9.5 under Ubuntu 8.</p>
<p>To install this gadget, <a href="http://www.google.com/ig/adde?hl=en&amp;moduleurl=http://www.shaftek.org/code/greader-gadget/reader.xml&amp;source=imag">click here</a>. For full source, <a href="http://www.shaftek.org/code/greader-gadget/reader.xml">click here</a>. Comments are welcome to <strong>code /at/ shaftek [dot] org</strong>.</p>
<p>&#8212;&#8212;- What follows is a somewhat technical discussion of some of the gory aspects of this gadget, if you are not a techy, feel free to skip it &#8212;&#8212;-</p>
<p>My initial goal was to make a notifier just like the Bloglines one, that simply shows how many unread items there are and a link to Google Reader. Unfortunatly, there is not official API, but there is an unofficial one which has been documented <a href="http://code.google.com/p/pyrfeed/wiki/GoogleReaderAPI">here</a>. There is an API call called &#8220;unread-count&#8221; which returns the exact data I needed in either XML and JSON. However, in order to obtain it, you must be logged into Google Reader.</p>
<p>Here is where I ran into cross domain security issues with Javascript. In order to get the XML output of that API call, my code needs to share the same domain as Google itself. There is no service that Google offers that can do that probably for the security reasons. Flash via crossdomain.xml isn&#8217;t helpful either. Using a proxy wouldn&#8217;t help since I do not have access to the cookies for Google Reader. And to top it off, the content on iGoogle is not served from a Google.com subomain.</p>
<p>So the next obvious step would be JSON. HOWEVER, the unofficial reader API does not offer ability to have a callback function for the JSON format. The raw JSON format is useless since the browser does not offer access to it via innerHTML, and it by itself is not parsed.</p>
<p>The next alternative after that would be to use the <a href="http://code.google.com/apis/gadgets/docs/legacy/remote-content.html">Gadget API Fetch functions</a> to request the XML or JSON directly. HOWEVER, we run into the same cookie problem as before &#8211; they are served via a server side proxy which is not on the same domain. To get around that problem one could use <a href="http://code.google.com/apis/accounts/docs/AuthForInstalledApps.html">the Account Authentication API</a> which does return the correct tokens. BUT, the Gadget API&#8217;s fetch functions do not support cookies which is the only way to use the security tokens once they have been obtained.</p>
<p>As a last alternative, I was planning on using the Fetch functions to do the account authentication and then a server side proxy via something like AppEngine or my own server, that can be passed the SID and then pass it on as a cookie. However, that would involve more time than necessary so I leave that exercise to the reader.</p>
<p>The end result is a hack that simply embeds the mobile version of Google Reader in iGoogle. It might not be very innovative but hey, it works.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.shaftek.org/blog/2008/10/19/unofficial-google-reader-gadget-for-igoogle/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Installing Subversion and Trac on 1and1 Shared Hosting</title>
		<link>http://www.shaftek.org/blog/2008/02/28/installing-subversion-and-trac-on-1and1-shared-hosting/</link>
		<comments>http://www.shaftek.org/blog/2008/02/28/installing-subversion-and-trac-on-1and1-shared-hosting/#comments</comments>
		<pubDate>Thu, 28 Feb 2008 04:05:12 +0000</pubDate>
		<dc:creator>Yakov Shafranovich</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Website]]></category>
		<category><![CDATA[1and1]]></category>
		<category><![CDATA[subversion]]></category>
		<category><![CDATA[trac]]></category>

		<guid isPermaLink="false">http://www.shaftek.org/blog/2008/02/28/installing-subversion-and-trac-on-1and1-shared-hosting/</guid>
		<description><![CDATA[Installing server side components on shared hosting is always a challenge. In the last few weeks as I have begun to undertake more web based consulting assignments, I have found myself facing the need for source code management as well as project management. At my old startup, we use Subversion in combination with Trac, and [...]]]></description>
			<content:encoded><![CDATA[<p>Installing server side components on shared hosting is always a challenge. In the last few weeks as I have begun to undertake more web based consulting assignments, I have found myself facing the need for source code management as well as project management. At <a href="http://www.solidmatrix.com">my old startup</a>, we use <a href="http://subversion.tigris.org/">Subversion</a> in combination with <a href="http://trac.edgewall.org/">Trac</a>, and I decided to use this for myself as well. Here is a summary of steps I undertook to install these on 1&amp;1 shared hosting:</p>
<p>1. For Subversion, I basically followed <a href="http://joemaller.com/2008/01/29/how-to-install-subversion-on-a-shared-host/">this guide</a> written by Joe Maller. For 1&amp;1 there are two important notes &#8211; you need to specify <strong>&#8211;without-neon</strong> in order to avoid errors with svn/python bindings, AND if you install everything into a single directory, it helps too.</p>
<p>2. For Trac, you need to download and extract Python 2.5. Then you can install it with <strong>./configure &#8211;prefix=SOME DIRECTORY</strong>. After installation, just update your PATH in bashprofile and relogin. <strong>KEEP IN MIND: </strong>Subversion will only be available via SSH and only to a single user. 1&amp;1 does not allow multiple SSH users on one account AND they do not have the SSH module installed in Apache. IF you are looking for a multiuser Subversion, try a different provider.</p>
<p>3. After Trac is installed, you can use trac.cgi or trac.fcgi for the web based stuff. HOWEVER, do change the shebang line to point to the Python 2.5 executable in your home directory.</p>
<p>4.  SQL Lite does not work on 1&amp;1. Use MySQL instead.</p>
<p>Unfortunately, after spending a sizable amount of my day on this, I realized that I need multiuser access. 1&amp;1 does not allow more than one SSH user so I wasted my time with this. Instead, I just got an account with DreamHost which will host SVN and Trac. While DreamHost isn&#8217;t the most reliable host in the world, it looks like I will be using 1&amp;1 for the more reliable stuff like websites and Dreamhost for development.</p>
<hr /><em><span style="text-decoration: underline;"><strong>Sponsored Links:</strong></span></em></p>
<p><em>This is a typical set-up on a <a rel="nofollow" href="http://www.hostingobserver.com/">cheap host,</a> for more details check out the <a href="http://www.installationwiki.org/Installing_Trac_and_Subversion">SVN and Trac Installation wiki</a>.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.shaftek.org/blog/2008/02/28/installing-subversion-and-trac-on-1and1-shared-hosting/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Why the Left() Function Stops Working in VBA</title>
		<link>http://www.shaftek.org/blog/2007/11/22/why-the-left-function-stops-working-in-vba/</link>
		<comments>http://www.shaftek.org/blog/2007/11/22/why-the-left-function-stops-working-in-vba/#comments</comments>
		<pubDate>Fri, 23 Nov 2007 03:54:10 +0000</pubDate>
		<dc:creator>Yakov Shafranovich</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[vba]]></category>
		<category><![CDATA[windows]]></category>

		<guid isPermaLink="false">http://www.shaftek.org/blog/2007/11/22/why-the-left-function-stops-working-in-vba/</guid>
		<description><![CDATA[This is an interesting problem that my wife had at work recently. In a VBA-based program, the Left function suddenly stopped working with an error along the lines of &#8220;type data mismatch&#8221;. Being that this is a native function to VBA, my first thoughts were that it was caused by some unwary VBA upgrade. However, [...]]]></description>
			<content:encoded><![CDATA[<p>This is an interesting problem that my wife had at work recently. In a VBA-based program, the Left function suddenly stopped working with an error along the lines of &#8220;type data mismatch&#8221;. Being that this is a native function to VBA, my first thoughts were that it was caused by some unwary VBA upgrade. However, the truth turned out to be more interesting.</p>
<p>It seems that when VBA is compiled, other libraries may import their own functions into the<br />
global VBA namespace. What that means is if there is another <strong>Left</strong> function function that is global in some library, it would override the VBA built in function since the VBA native stuff is resolved last. A better description of this problem can be found at <a href="http://www.papwalker.com/dllhell/index-page3.html">this DLL Hell site</a> hosted by Walker Software.</p>
<p>In my wife&#8217;s instance, it turned out that the <strong>MSHTML</strong> library was the culprit, in particular the <strong><a href="http://msdn2.microsoft.com/en-us/library/aa703613.aspx">IMarkupPointer.Left</a></strong> function which for some reason was exposed globally. Microsoft has <a href="http://support.microsoft.com/kb/276560">an article</a> describing a similar problem with the <strong>Devshl.dll</strong> library.</p>
<p>What is the solution? To fully qualify the function name adding the namespace which would mean calling <strong>VBA.Left</strong> instead of plain <strong>Left</strong>. Of course, if you have millions lines of code that uses the plain function, you are in trouble. Being that is in VBA 6 which is not .NET and is not really being supported by Microsoft, tough luck.</p>
<p>Maybe it is time to try out Java or PHP?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.shaftek.org/blog/2007/11/22/why-the-left-function-stops-working-in-vba/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Flattening Transparencies in PDF with Free Tools</title>
		<link>http://www.shaftek.org/blog/2007/11/09/flattening-transperancies-in-pdf-with-free-tools/</link>
		<comments>http://www.shaftek.org/blog/2007/11/09/flattening-transperancies-in-pdf-with-free-tools/#comments</comments>
		<pubDate>Fri, 09 Nov 2007 16:49:39 +0000</pubDate>
		<dc:creator>Yakov Shafranovich</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[pdf]]></category>
		<category><![CDATA[printing]]></category>

		<guid isPermaLink="false">http://www.shaftek.org/blog/2007/11/09/flattening-transperancies-in-pdf-with-free-tools/</guid>
		<description><![CDATA[An interesting issue has come up recently with my publishing company - one of our printing suppliers flagged incoming PDFs as being not-printable due to transparencies. After looking around for solutions, I came up with a way to resolve the issue without resorting to Acrobat (which we don&#8217;t use). The solution is two fold: 1. First convert the incoming PDF to PostScript using XPDF&#8216;s pdftops. This will flatten the transparencies. GhostScript&#8217;s pdf2ps tool DOES NOT do that. 2. Then convert the PostScript back to PDF using GhostScript&#8216;s ps2pdf tool. Both tools are open source and free (although watch out for GhostScript&#8217;s GPL license). One important point &#8211; pdftops requires a paper width and height unless [...]]]></description>
			<content:encoded><![CDATA[<p>An interesting issue has come up recently with <a HREF="http://www.publishyoursefer.com">my publishing company</a> - one of our printing suppliers flagged incoming PDFs as being not-printable due to transparencies. After looking around for solutions, I came up with a way to resolve the issue without resorting to Acrobat (which we don&#8217;t use). The solution is two fold:</p>
<p>1. First convert the incoming PDF to PostScript using <a HREF="http://www.foolabs.com/xpdf/">XPDF</a>&#8216;s pdftops. This will flatten the transparencies. GhostScript&#8217;s pdf2ps tool DOES NOT do that.</p>
<p>2. Then convert the PostScript back to PDF using <a HREF="http://pages.cs.wisc.edu/~ghost/">GhostScript</a>&#8216;s ps2pdf tool.</p>
<p>Both tools are open source and free (although watch out for GhostScript&#8217;s GPL license). One important point &#8211; <strong>pdftops</strong> requires a paper width and height unless you want to end up with 8&#215;11 paper for everything. The width/height is specified in  <a HREF="http://en.wikipedia.org/wiki/Point_(typography)#Current_DTP_point_system">PostScript points</a> which are 72 per inch.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.shaftek.org/blog/2007/11/09/flattening-transperancies-in-pdf-with-free-tools/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Another Book Search Engine Experiment</title>
		<link>http://www.shaftek.org/blog/2007/07/30/another-book-search-engine-experiment/</link>
		<comments>http://www.shaftek.org/blog/2007/07/30/another-book-search-engine-experiment/#comments</comments>
		<pubDate>Mon, 30 Jul 2007 04:17:31 +0000</pubDate>
		<dc:creator>Yakov Shafranovich</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[bookchaser.net]]></category>
		<category><![CDATA[books]]></category>
		<category><![CDATA[google]]></category>

		<guid isPermaLink="false">http://www.shaftek.org/blog/2007/07/30/another-book-search-engine-experiment/</guid>
		<description><![CDATA[About two years ago I coded a small experimental search engine for books which used Ajax and Amazon web services. Recently, I went back to the same concept and put up a new experiment &#8211; a meta search engine for book information that aggregates book data from about 60 different sites on the Internet &#8211; [...]]]></description>
			<content:encoded><![CDATA[<p>About two years ago <a HREF="http://www.shaftek.org/blog/archives/000310.html">I coded</a> a small experimental search engine for books which used Ajax and Amazon web services. Recently, I went back to the same concept and put up a new experiment &#8211; a meta search engine for book information that aggregates book data from about 60 different sites on the Internet &#8211; including social networks, book databases and download projects such as Project Gutenberg. It was built using <a HREF="http://www.google.com/coop/cse">Google Custom Search Engine</a>. The result can be found here:</p>
<p><a HREF="http://www.bookchaser.net">BookChaser.Net</a></p>
<p>The old Amazon project had stopped working and I had taken it offline.</p>
<p>Comments are welcome to <strong>code /at/ shaftek [dot] org</strong>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.shaftek.org/blog/2007/07/30/another-book-search-engine-experiment/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Converting from DJVU to PDF</title>
		<link>http://www.shaftek.org/blog/2007/07/25/converting-from-djvu-to-pdf/</link>
		<comments>http://www.shaftek.org/blog/2007/07/25/converting-from-djvu-to-pdf/#comments</comments>
		<pubDate>Wed, 25 Jul 2007 14:58:30 +0000</pubDate>
		<dc:creator>Yakov Shafranovich</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[conversion]]></category>
		<category><![CDATA[djvu]]></category>
		<category><![CDATA[pdf]]></category>

		<guid isPermaLink="false">http://www.shaftek.org/blog/2007/07/25/converting-from-djvu-to-pdf/</guid>
		<description><![CDATA[One of the more mundane tasks that faces every publishing business like mine is data conversion. Recently, I have been involved in a major project which seeks to make available several hundred titles in print on demand format. Unfortunatly, the library that scanned these titles did not use PDF &#8211; rather they used a more [...]]]></description>
			<content:encoded><![CDATA[<p>One of the more mundane tasks that faces every publishing business like <a HREF="http://www.publishyoursefer.com">mine</a> is data conversion. Recently, I have been involved in a  major project which seeks to make available several hundred titles in print on demand format. Unfortunatly, the library that scanned these titles did not use PDF &#8211; rather they used a more obscure format called DJVU (see <a HREF="http://en.wikipedia.org/wiki/DjVu">Wikipedia</a> for more information). This format was invented by AT&amp;T Labs (which also invented VNC). It claims to compress data better than PDF but in a weird fashion. Unlike PDF which stores most documents in one layer, DjVu actually uses 3 layers &#8211; background, foreground and mask. The mask layer usually has the text, the background has the picture of the page it was scanned from and the foreground has the rest. Some fancy protocol is used to determine what goes where when the scan is originally encoded.</p>
<p>However, in the printing business DjVu is not used &#8211; rather everything needs to be in PDF. So in this post I will outline how I was able to sucessfully convert DjVu files to PDF using freely available tools. But first, here are some things that DID NOT work:<br />
1. Converting DjVu to Postscript and then to PDF &#8211; takes too long.<br />
2. Converting only the foreground layer or the mask layer in DjVu &#8211; loses some of the data.</p>
<p>Here is the software that is needed for the conversion to take place:<br />
1. <a HREF="http://djvu.sourceforge.net/">DjVuLibre</a> .<br />
2. <a HREF="http://www.libtiff.org/">LibTiff</a> .</p>
<p>If you are running Windows, then you will need <a HREF="http://www.cygwin.org">Cygwin</a> and a Cygwin version of DjVuLibre. The compiled Windows version does not include TIFF support (although you can get <a HREF="http://www.djvu-soft.narod.ru/DjVuLibre_3_5_16_Win32_Tiff.rar">this package</a> from a site in Russia which included the TIFF support). LibTiff comes in a native Windows version.</p>
<p>Here are the conversion steps:<br />
1. Convert DjVu file to TIFF using ddjvu.<br />
2. Convert TIFF to PDF using tiff2pdf.</p>
<p>Assuming the input DJVU file is called &#8220;input.djvu&#8221; here are the steps:</p>
<pre>
djvu -verbose -format=tif input.djvu output.tif
tiff2pdf output.tif -o output.pdf</pre>
<p>The ddjvu utility has an option to convert specific layers. One common mistake is to convert only the mask layer or the foreground layer . Technically speaking, the mask layer is the one that should have the actual text but in practice I have seen that the the DjVu encoder occasionally puts portions of the text in the background layer. Thus, if you only take the foreground or mask layers, you will lose those bits in the background. If your specific files don&#8217;t have that issue, that you should use the layer switch since it reduces file size and increases readibility.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.shaftek.org/blog/2007/07/25/converting-from-djvu-to-pdf/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>An Ajax Search Engine Without Servers (Almost)</title>
		<link>http://www.shaftek.org/blog/2007/07/10/an-ajax-search-engine-without-servers-almost/</link>
		<comments>http://www.shaftek.org/blog/2007/07/10/an-ajax-search-engine-without-servers-almost/#comments</comments>
		<pubDate>Wed, 11 Jul 2007 02:40:00 +0000</pubDate>
		<dc:creator>Yakov Shafranovich</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[ajax]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[shaftag]]></category>

		<guid isPermaLink="false">http://www.shaftek.org/blog/2007/07/10/an-ajax-search-engine-without-servers-almost/</guid>
		<description><![CDATA[For a while I have been working on a hobby project trying to make a meta-search engine that you can use to search multiple search engines by tag. The catch? No server side components. This search engine works client side only from the user&#8217;s browser by using RSS feeds from search engines and Google&#8217;s AJAX [...]]]></description>
			<content:encoded><![CDATA[<p>For a while I have been working on a hobby project trying to make a meta-search engine that you can use to search multiple search engines by tag. The catch? No server side components. This search engine works client side only from the user&#8217;s browser by using RSS feeds from search engines and <a HREF="http://code.google.com/apis/ajaxfeeds/">Google&#8217;s AJAX Feed API</a> to access them from the client browser.</p>
<p>The engine is called <strong>&#8220;ShafTag&#8221;</strong> and can be found at <a HREF="http://www.shaftag.com">www.shaftag.com</a>.</p>
<p>Comments are welcome at code /at/ shaftek [dot] org.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.shaftek.org/blog/2007/07/10/an-ajax-search-engine-without-servers-almost/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.697 seconds -->

