Tuesday, 10 July 2012

Multi byte safe coding


Again sorry it's been a little while since I posted (this seems to happen a lot) but I've been quite busy at work on my PHP login script, building a few client sites and learning After Effects for more client work.

This post is all about multi byte characters and how to write your code so it can handle them, which for the most part isn't to hard in PHP. Everything I'm writing in this post is based around UTF-8 but I believe it'll be applicable to some other types of encoding, worth looking up for yourselves though if it's important to you as there are far to many types of encoding for me to look into.

What is a multi byte character?
A multi byte character is basically what it sounds like, a single character which takes up more than one byte of space. To explain that a bit better, if you had a plain text file and save it with only one English letter in it'll use up 1 byte of space on the hard drive, every character in the English language and numeric system is the same.

Now this isn't the same for all languages in the world for example a fair few Chinese characters and even some mathematical symbols among quite a few others will use more than one byte of space to represent them.

Do you need to worry about being multi byte safe?
It all depends on your site / application and if you think it'll need to handle multi byte data or not, personally I'm thinking I may start to write & rewrite all my generic classes which I use or intend to use in many other places to be multi byte safe; my secure PHP login is going to be a good place to start.

As far as I can tell there isn't a down side to using it but I haven't personally done any performance testing yet, I can't find anything online that says it has a downside though. If you find something different please leave a comment.

Little bit of extra info
I've put a link to the php.net site which lists all the PHP functions that have multi byte equivalent functions but also the standard explode & str_replace functions are multi byte safe as far as I can tell.

Further information


No comments:

Post a Comment