Unicode Grabber
Sorry for the hiatus – not sure who knows what, but I got married in late spring of this year. It was super busy before the wedding and things haven’t settled down since. We still haven’t decorated the new place yet or anything. In time…
Anyhow, I’ve built this little tool that I needed dearly for a project I was working on and thought I’d share it. I will soon post re-distributable code for everyone if anyone wants it. For now, I will give a short explanation and let you have at it.
The UnicodeGrabber is a tool that will parse any .txt or .xml file, grab all of the characters (only characters in node values/attributes for XML files) and return their Unicode Hex values. It also strips duplicates from the list.
What is this good for? I needed to create a lightweight font package for some Asian Flash pieces and the only way to do that is to only embed the characters you need. Since I have no knowledge of any CJK alphabet I decided to look to code. When you use the Flex SDK to create a font SWF it will (by default) embed the entire character set. Nice. Handy. Except when you need an Asian font package and realize that all of the characters in the Japanese alphabet will weigh down your font SWF to about 12 megabytes. So, it was either make this little app or a really REALLY cool pre-loader…….
Enjoy!
BTW, if it breaks please let me know. Take a screenshot if you can.
The sample font.as class can be found here.


I do need to give credit where credit is due. I couldn’t remember the blog post at the time, but this was inspired by a blog post by Mr. Doob on April 23rd 2009 entitled “Optimizing Asian Fonts for Multi-Language Flash Sites”. The perma-link has disappeared, but it’s still available on his blog.
Originally, his PHP script is the tool that I used for grabbing the unicode characters – it was a lifesaver! But I needed my tool to evolve and so I rewrote that functionality in one line of ActionScript.
var unicodeHexID:String = new String(_input.charCodeAt(i).toString(16).toUpperCase());
This will only give you the character code, so you have to add to the string as necessary:
while(unicodeHexID.length<4) unicodeHexID=”0″+unicodeHexID;
while(unicodeHexID.length<3) unicodeHexID=”00″+unicodeHexID;
unicodeHexID = “U+” + unicodeHexID;
The first line will add a zero (0) to the string if the unicode hex ID’s length is only 3 characters. The second line will add two zeros (00) if the ID’s length is only 2 characters. Flex SDK won’t understand a unicode ID that is only 2 or 3 spots long. And on top of that, the last and final line adds a “U+” to the front of the string. This is also necessary for the Flex SDK to understand that the item in the array is indeed a unicode hex ID. Yay.
Keep in mind that this is rather sloppy and, well, there are probably more elegant ways to do it. This is just a developer’s tool to embed fonts. “The cobbler’s kids have no shoes…”, etc.
Happy Coding!
jevin kones