« ICANN Approves .XXX Domain Old ASRG Archives »
Removing Vowels from Hebrew Unicode Text
Posted June 3, 2005 – 4:28 pm by Yakov Shafranovich in Politics, ProgrammingOne of the questions that recently came up is how to remove vowels from Hebrew characters in Unicode (or any other similar language). A quick look at Hebrew Unicode chart shows that the vowels are all located between 0×0591 (1425) and 0x05C7 (1479). With this and Javascript’s charCodeAt function, it is trivial to strip them out:
function stripVowels(rawString)
{
var newString = '';
for(j=0; j<rawString.length; j++) {
if(rawString.charCodeAt(j)<1425
|| rawString.charCodeAt(j)>1479)
{ newString = newString + rawString.charAt(j); }
}
return(newString);
}
You can test it here.
Tags: dhtml, javascript, unicode —
Permalink | Trackback URL | This post has
4 Responses to “Removing Vowels from Hebrew Unicode Text”
Hello,
I’d very much like to use your code to strip the vowels from either a Unicode file or an Excel spreadsheet with multiple Hebrew words. How can I make it work? Thanks so much.
Best,
Lance
By Lance Laytner on Aug 31, 2008
Thank you! This code is incredible, even if I haven’t any idea how to actually use it on my mac….. So I’ll be using your page to remove the vowels when I need to. I hope you’ll keep it right here.
By Jeff on Oct 4, 2009
Thanks. You saved me a lot of time and tedious data entry!
By Jay R on Oct 10, 2011
This was very helpful. Thank you! It works wonderfully for stripping nikudos off of a shoresh. I translated your algorithm to Ruby, for use in a personal project.
def stripVowels(rawString)
newString = ”
rawString.mb_chars.each_char do |c|
newString << c if c.ord1479
end
return newString
end
By Dovid Harrison on Dec 9, 2011