mysqldump导出错误字符集中的数据

时间:2013-03-30 12:24:07

标签: mysql utf-8 character-encoding mysqldump latin1

昨天我第一次导出我的Mysql数据库,我在转储中发现了一些非常奇怪的字符,如:

INSERT INTO `piwik_archive_blob_2013_01` VALUES (15,'Actions_actions_url_6',1,'2013-01-17','2013-01-17',1,'2013-01-20 07:36:53','xuNM0ý/œ#&ÝÕ³\ZõNYpÊÀì#!üw7Hж}°ÀAáZoN*šgµ\'GWª[Yûðe¯57 ÃÁÆ7|Ÿ\'Ü%µDh©-EÛ^ËL±ÕÞtªk@(,b±ßZ.ÒÃ6b²aiÓÍ)87[­ïÎœ,æya¥uÒ<|+íª7MNuïÝ¿8ñ%1Ʊ>Ú­X');

我的服务器MySQL的版本是:5.1.66-0 + squeeze1(Debian)。 此数据库由Piwik安装脚本自动创建。

以下是我尝试解决此问题的方法:

#1 首先,我检查了数据库字符集。

> show table status;

26个表的排序规则为utf8_general_ci,听起来很正常。 我猜到mysqldump导出了一个不同的字符集(latin1?) 所以我试过了:

mysqldump -u user -p**** --all-databases --default-character-set=utf8 | gzip -9 > dump.sql.gz

结果 =我仍然有相同的奇怪角色。

注意)稍后我了解到mysqldump的默认字符集是utf8,与服务器默认字符集无关。所以--default-character-set=utf8没用了。

#2 然后我想我可以通过更新mysql conf来解决问题。原来的conf是:

mysql> show variables like "%character%";show variables like "%collation%";

+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | latin1                     |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | latin1_swedish_ci |
| collation_database   | latin1_swedish_ci |
| collation_server     | latin1_swedish_ci |
+----------------------+-------------------+

所以我更新了/var/lib/mysql/my.cnf并添加了:

[mysqld]
init_connect='SET collation_connection = utf8_general_ci'
init_connect='SET NAMES utf8'
character-set-server=utf8
collation-server=utf8_general_ci
default-character-set=utf8
default-collation=utf8_general_ci

[mysqldump]
default-character-set=utf8

然后

/etc/init.d/mysql restart
mysql> show variables like "%character%";show variables like "%collation%";

+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | utf8_general_ci   |
| collation_database   | latin1_swedish_ci |
| collation_server     | utf8_general_ci   |
+----------------------+-------------------+

结果 =同样奇怪的字符。

#3 我更改了character_set_databasecollation_database

mysql> ALTER DATABASE piwik default character SET utf8 collate utf8_general_ci;

mysql> show variables like "%character%";show variables like "%collation%";

+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database   | utf8_general_ci |
| collation_server     | utf8_general_ci |
+----------------------+-----------------+

结果 =同样奇怪的字符。

#4 我现在明白我应该在创建数据库之前将MySQL(latin1)中的默认字符集更改为utf8

排序规则utf8_general_ci(#1)表示数据存储在utf8中。但是,mysqldump是否有可能认为数据存储在latin1中并在utf8中编码数据? 这意味着最后数据是双utf8编码(叹气)。在这种情况下,我该如何解决问题?

感谢您的帮助。

ps)我想知道为什么Piwik不需要将数据库默认字符集更改为utf8。

2 个答案:

答案 0 :(得分:0)

根据表名“piwik_archive_ blob _2013_01”,我猜包含奇怪字符的列是BLOB类型。

BLOB列包含二进制数据。这就是它包含这些奇怪角色的原因。这是预期的。

不要担心,我很确定MysqlDump知道如何转储这些数据。

干杯, 埃里克。

答案 1 :(得分:0)

可能是操作系统在导出期间更改字符集并忽略default-character-set参数。

要确保导出未使用操作系统字符集,请使用参数result-file

查看这篇文章: http://nathan.rambeck.org/blog/1-preventing-encoding-issues-mysqldump