Google
 

«           »

Removing Vowels from Hebrew Unicode Text

Posted June 3, 2005 – 4:28 pm by Yakov Shafranovich in Politics, Programming

One of the questions that recently came up is how to remove vowels from Hebrew characters in Unicode (or any other similar language). A quick look at Hebrew Unicode chart shows that the vowels are all located between 0×0591 (1425) and 0x05C7 (1479). With this and Javascript’s charCodeAt function, it is trivial to strip them out:

function stripVowels(rawString)
{
	var newString = '';
	for(j=0; j<rawString.length; j++) {
		if(rawString.charCodeAt(j)<1425
			 || rawString.charCodeAt(j)>1479)
		{ newString = newString + rawString.charAt(j); }
	}
	return(newString);
}

You can test it here.

Tags: , ,

Permalink | Trackback URL | This post has

  1. 4 Responses to “Removing Vowels from Hebrew Unicode Text”

  2. Hello,
    I’d very much like to use your code to strip the vowels from either a Unicode file or an Excel spreadsheet with multiple Hebrew words. How can I make it work? Thanks so much.
    Best,
    Lance

    By Lance Laytner on Aug 31, 2008

  3. Thank you! This code is incredible, even if I haven’t any idea how to actually use it on my mac….. So I’ll be using your page to remove the vowels when I need to. I hope you’ll keep it right here.

    By Jeff on Oct 4, 2009

  4. Thanks. You saved me a lot of time and tedious data entry!

    By Jay R on Oct 10, 2011

  5. This was very helpful. Thank you! It works wonderfully for stripping nikudos off of a shoresh. I translated your algorithm to Ruby, for use in a personal project.

    def stripVowels(rawString)
    newString = ”
    rawString.mb_chars.each_char do |c|
    newString << c if c.ord1479
    end
    return newString
    end

    By Dovid Harrison on Dec 9, 2011

Post a Comment