无法让UTF-8特殊字符正确写入MySQL(PHP)

时间:2015-05-18 01:03:33

标签: php mysql encoding utf-8

我正在创建一个从典型LAMP堆栈(L = OS X)上的命令行运行的PHP脚本,并且在使用特殊字符在数据库中正确记录时遇到很多麻烦。

此脚本以递归方式扫描目录,并将完整路径插入MySQL数据库表。我已经做了很多关于如何让特殊字符写入MySQL的研究,但它们显示为?字符。

以下是代码:

<?PHP
ini_set('default_charset', 'UTF-8');


$link = mysql_connect('localhost', '--USER--', '--PASSWORD--');
mysql_set_charset('utf8',$link);

if (!$link) {
   die('Could not connect: ' . mysql_error());
}

if(!mysql_select_db("files")) {
   die('Could not connect: ' . mysql_error());
}

mysql_query("SET NAMES utf8");
mysql_query("SET CHARACTER SET utf8");

function startsWith($haystack, $needle) {
    return $needle === "" || strrpos($haystack, $needle, -strlen($haystack)) !== FALSE;
}

function getDirContents($dir, &$results = array()) {
  $files = scandir($dir);
    foreach($files as $key => $value) {
            $path = realpath($dir.DIRECTORY_SEPARATOR.$value);
            if(startsWith($path,'/Volumes/Macintosh HD/')) {
                    unset($files[$key]);
            } else if(!is_dir($path) && !startsWith($value,'.') && startsWith($path,'/Volumes/')) {
                    $results[] = $path;
                    $query="INSERT IGNORE INTO files (path,dir) VALUES ('$path','0')";
                    mysql_query($query);
            } else if(is_dir($path) && !startsWith($value,'.') && startsWith($path,'/Volumes/')) {
                    getDirContents($path, $results);
                    $results[] = $path;
                    $query="INSERT IGNORE INTO files (path,dir) VALUES ('$path','1')";
                    mysql_query($query);
            }
    }
    return $results;
}


$directory='/Volumes'; 
$files=getDirContents($directory);
sort($files);
print_r($files);

?>

有问题的路径是:

/Volumes/Mac Stadium Shuttle 1/DIG2008060702/files/Susan-Jürgen.dvdproj/Contents/PkgInfo

注意Jürgen中的变音符号。当脚本打印出数组中的所有文件时,ü会正确显示。

如果我在PHP脚本中添加一行以打印mysql_query(),则会返回以下内容:

INSERT IGNORE INTO files (path,dir) VALUES ('/Volumes/Mac Stadium Shuttle 1/DIG2008060702/files/Susan-Jürgen.dvdproj/Contents/PkgInfo','0')

ü再次正确显示。

从MySQL命令行客户端,我SELECT这条路径:

mysql> select * from files where path like '%susan%';

......和回复:

+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+-----------+------+---------------+
| ID     | path                                                                                                                                                                  | dir  | google_id | md5  | deleted_local |
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+-----------+------+---------------+
| 644990 | /Volumes/Mac Stadium Shuttle 1/DIG2008060702/files/Susan-Ju?rgen.dvdproj/Contents/PkgInfo                                                                             | 0    | NULL      | NULL | 0             |
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+-----------+------+---------------+

...请注意Jürgen中的ü显示为u?(Ju?rgen)

我一直在努力确保:

  • php.ini的默认字符集为UTF-8
  • 表的默认字符集是utf8
  • 数据库连接定义为utf8连接

我在此脚本的顶部(phpinfo();之后)附近添加了ini_set()并从CLI运行它。 default_charset => UTF-8 => UTF-8出现在回复中。

在脚本中连接数据库后,我添加了echo mysql_client_encoding($link);并打印了脚本utf8

另外,我跑了:

mysql> show variables like 'char%';

响应:

 +--------------------------+--------------------------------------------------------+
 | Variable_name            | Value                                                  |
 +--------------------------+--------------------------------------------------------+
 | character_set_client     | utf8                                                   |
 | character_set_connection | utf8                                                   |
 | character_set_database   | utf8                                                   |
 | character_set_filesystem | binary                                                 |
 | character_set_results    | utf8                                                   |
 | character_set_server     | utf8                                                   |
 | character_set_system     | utf8                                                   |
 | character_sets_dir       | /usr/local/mysql-5.6.24-osx10.8-x86_64/share/charsets/ |
 +--------------------------+--------------------------------------------------------+
 8 rows in set (0.05 sec)

那么,我做错了什么?

编辑表的结构是:

 mysql> DESCRIBE files;
 +---------------+------------------+------+-----+---------+----------------+
 | Field         | Type             | Null | Key | Default | Extra          |
 +---------------+------------------+------+-----+---------+----------------+
 | ID            | int(11) unsigned | NO   | PRI | NULL    | auto_increment |
 | path          | varchar(510)     | YES  | UNI | NULL    |                |
 | dir           | enum('0','1')    | YES  |     | 0       |                |
 | google_id     | varchar(255)     | YES  |     | NULL    |                |
 | md5           | varchar(255)     | YES  |     | NULL    |                |
 | deleted_local | enum('0','1')    | YES  |     | 0       |                |
 +---------------+------------------+------+-----+---------+----------------+
 6 rows in set (0.00 sec)

另一个编辑:

 mysql> show create table files;
 +-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 | Table | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
 +-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 | files | CREATE TABLE `files` (
      `ID` int(11) unsigned NOT NULL AUTO_INCREMENT,
      `path` varchar(510) CHARACTER SET latin1 DEFAULT NULL,
      `dir` enum('0','1') CHARACTER SET latin1 DEFAULT '0',
      `google_id` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
      `md5` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
      `deleted_local` enum('0','1') CHARACTER SET latin1 DEFAULT '0',
      PRIMARY KEY (`ID`),
      UNIQUE KEY `path` (`path`)
 ) ENGINE=InnoDB AUTO_INCREMENT=961879 DEFAULT CHARSET=utf8 |
  +-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  1 row in set (0.04 sec)

2 个答案:

答案 0 :(得分:2)

如第二次编辑所示,path列具有latin1字符集,即使表默认为utf8。也许你通过改变现有的桌子而进入这种状态?

尝试ALTER TABLE files MODIFY path VARCHAR(510) CHARACTER SET utf8;

答案 1 :(得分:0)

1.将数据库表字段的排序类型设置为utf8_unicode_ci

2.更改元标记。

meta http-equiv =&#34; Content-Type&#34;含量=&#34; text / html的;字符集= UTF-8&#34;

  1. 你可以使用echo utf8_encode($ value);在你的页面。