Support

Blog

As other web developers in China (and other non-latin language locations) probably know, UTF-8 is something that we need for Chinese to work correctly, however despite asking for UTF8 in the table structure, it doesn’t mean that MySQL or PHP is going to accede to our wishes.

Here is how we at Computer Solutions cope with forcing MySQL to UTF-8, despite its unwillingness!

First off, we need to configure the MySQL config via the my.cnf file to force the defaults to utf-8, instead of latin1.

Most systems my.cnf should be in either /etc/my.cnf or /etc/mysql/my.cnf (assuming *nix)

Make sure the following lines are added under mysqld:

[mysqld]
default-character-set=utf8
character-set-server=utf8
collation-server=utf8_general_ci

Restart mysql (debian or similar init.d should be /etc/init.d/mysql restart)

Lets check that worked –

mysql -u root -p
show variables like "%character%";

If you see results like this:

show variables like "%character%";
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

Then it worked.   If it doesn’t and you see something similar to below

mysql> show variables like "%character%";
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | latin1                     |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

Then we need to re-edit /etc/my.cnf

Add this under the [mysqld] lines we edited earlier.

skip-character-set-client-handshake

then restart mysql (/etc/init.d/mysql restart )   and recheck.

That forces MySQL to *use* what we <expletive deleted> well told it too!

That unfortunately isn’t the end of the story.  If like us, you’ve been merrily using UTF-8 data inside Latin1 encoded tables accidentally, you’ll still need to export your data in the correct format, and reimport it.

This can be done as follows:

Use MySQLDump with an explicit encoding passed to it:

mysqldump --user=username --password=password --default-character-set=latin1 \
 --skip-set-charset myscrewedupdb > dump.sql

This will give us a dump of the data in the correct format, albeit with the wrong encoding specified in the file.  You’ll need to edit the file to change latin1 to utf8   eg

sed -i "" 's/latin1/utf8/g' dump.sql

..and then re-import the data (I *strongly* suggest you do this to a NEW db for testing, as you DON’T want to lose data now, do you!)

mysql --user=username --password=password --default-character-set=utf8 \
newtestdbname < dump.sql

With a bit of luck, your data should be intact, and you should now be basking in the glory of UTF-8’ness, ready for the 20th Century etc!

Archives

Categories

Tags

PHOTOSTREAM