Recently, I had to build a multilingual portal. Soon I ran into character encoding problems. Gibberish would appear in place of special characters such as umlaut, accent circonflex, etc. Naturally, I checked out PHP manual, MySQL manual, blogs etc. The solution is easy but the problem needs to be solved at several steps. Basically, you need to make sure that your database and PHP input and output are in utf-8.

1. Set MySQL charset to utf-8
The important part is the charset and collate values on the last line.

create table collection (
 cid int not null primary key,
 name varchar(255) not null,
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE utf8_general_ci;

2. Set PHP encoding to UTF-8
Add the following line at the top of your PHP code. With this line, you are making sure that utf-8 is used instead of the default latin1 character encoding

mb_internal_encoding(“UTF-8”);

3. Configure php.ini
If you wish to change the default charset to utf8, open php.ini and set the following:

default_charset = “utf-8”

4. Convert incoming data from MySQL to UTF-8
Incoming data is converted to default encoding of PHP. To make sure you are getting utf-8, add the following two lines of code just after mysql_connect()

mysql_query(“SET NAMES ‘utf8′”);
mysql_query(“SET CHARACTER SET ‘utf8′”);