« Hard Cover Support Added for PublicDomainReprints.org Charlie the Groundhog »
Google Base and Unicode
Posted August 11, 2009 – 8:41 am by Yakov Shafranovich in WebsiteFor quite some time, Google Base feeds for some of of my projects were either partially ingested or rejected out of hand with a message “Required attribute missing”. I ran xmllint and several online validation tools, and found nothing. But thanks to a Mac blog, I finally figured it out.
It seems that while officially Google Base supports Unicode and utf-8 encoding in XML feeds as stated here. they don’t support it fully. Apparently it seems that instead of taking plain UTF-8 text, Google Base requires it to be encoded at Unicode entities like &xxxx; where xxxx is the Unicode codepoint. This was originally found by this blogger.
The solution in XSLT at least is to use us-ascii encoding which forces entity creation. In Perl you can probably use Encode.pm or iconv.
Many thanks to Michael Fourman of the Mac Tips blog for this.
Tags: google base, unicode, xslt —
Permalink | Trackback URL | This post has









