我在从db中插入/读取utf8内容时遇到了问题。我正在做的所有验证似乎都指出我的数据库中的内容应该是utf8编码的事实,但它似乎是拉丁编码的。数据最初是从CLI的PHP脚本导入的。
Zend Framework Version: 1.10.5
mysql-server-5.0: 5.0.51a-3ubuntu5.7
php5-mysql: 5.2.4-2ubuntu5.10
apache2: 2.2.8-1ubuntu0.16
libapache2-mod-php5: 5.2.4-2ubuntu5.10
-mysql:
mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_bin |
| collation_server | utf8_general_ci |
+----------------------+-----------------+
-database
created with
CREATE DATABASE mydb CHARACTER SET utf8 COLLATE utf8_bin;
CREATE SCHEMA `mydb` DEFAULT CHARACTER SET utf8 COLLATE utf8_bin ;
mysql> status;
--------------
mysql Ver 14.12 Distrib 5.0.51a, for debian-linux-gnu (i486) using readline 5.2
Connection id: 7
Current database: mydb
Current user: root@localhost
SSL: Not in use
Current pager: stdout
Using outfile: ''
Using delimiter: ;
Server version: 5.0.51a-3ubuntu5.7-log (Ubuntu)
Protocol version: 10
Connection: Localhost via UNIX socket
Server characterset: utf8
Db characterset: utf8
Client characterset: utf8
Conn. characterset: utf8
UNIX socket: /var/run/mysqld/mysqld.sock
Uptime: 9 min 45 sec
-sql:在进行插入之前我运行
SET names 'utf8';
-php:在进行插入操作之前,我使用 utf8_encode()和 mb_detect_encoding(),它会给我'UTF-8'。从db中检索内容之后,在将其发送给用户之前,mb_detect_encoding()也提供'UTF-8'
让我正确显示内容的唯一方法是将内容类型设置为拉丁语(如果我嗅到流量,我可以看到带有ISO-8859-1的内容类型标题):
ini_set('default_charset', 'ISO-8859-1');
此测试显示内容为拉丁文。我不明白为什么。 有人有任何想法吗?
感谢。
答案 0 :(得分:8)
好吧,我发现SET NAMES
并不是那么好。在the docs ...
我通常做的是执行4个查询:
SET CHARACTER SET 'UTF8';
SET character_set_database = 'UTF8';
SET character_set_connection = 'UTF8';
SET character_set_server = 'UTF8';
给出一个镜头,看看是否适合你...
哦,请记住,所有UTF-8字符< = 127也是有效的ISO-8859-1字符。因此,如果您在流中只有字符< = 127,mb_detect_encoding
将落在较高流行率字符集(默认情况下为“UTF-8”)...
答案 1 :(得分:1)
SHOW FULL COLUMNS FROM table;
会显示什么?拥有默认字符集的表并不意味着该列。即,这是有效的:
CREATE TABLE test (
`name` varchar(10) character set latin1
) CHARSET=utf8