“ sqlsrv_fetch_array”函数返回的字符串为UTF-8,而不是Latin1。
存储在MS SQL SERVER中的字符串数据使用Latin1。当数据库工具MS VS,AquaDataStudio等显示VARCHAR数据时,它将显示为Latin1,这是数据库中的数据类型。现有的PHP 5代码(使用FreeTDS)也可以正确显示数据。这很重要,因为某些字段包含以Base128编码方案存储的二进制数据,该方案使用十进制范围为128-255的单字节打印字符,并使用PHP ORD函数将其解码为VARCHAR字符串。
现在我们必须转到PHP 7,并且正在使用Microsoft驱动程序(以“ sqlsrv”开头的函数,例如“ sqlsrv_connect”,“ sqlsrv_query”,“ sqlsrv_fetch_array”等),它们不会返回字符串(VARCHAR)为拉丁文1,但为多字节UTF-8,当然这有两个含义: 1)返回的字符串比原始字符串长得多,因此验证检查失败和/或字符串被截断,并且 2)PHP ORD函数仅返回0-127十进制范围内字符的正确值,因此解码/解密代码失败。
我在PHP 5的本地数据库中创建了一个表,该表包含1-255范围内每个字符的记录。当我读回PHP 5中的记录并显示字符及其“ ORD()”值时,我看到了正确的结果。 PHP 7版本的代码正确显示了1-127范围内的字符,但128-255却显示为问号,带有错误ORD()值的一字节字符或两个或三个字节的字符串,其中“ ORD”返回的值不正确。
问题似乎在于“ sqlsrv_fetch_array”函数将所有字符串数据强制为UTF-8,并且仅返回UTF-8。
此外,该应用程序使用了超过五百万行的PHP 5代码,因此尝试对每个查询返回的每个字段进行遍历并使用iconv()是不切实际的。
如何从实际存储在Latin1中的数据库中获取PHP 7中返回的Latin1数据?
在使用Microsoft“ sqlsrv”函数的PHP 7中,到目前为止我没有尝试过。我尝试过的一些事情是:
SELECT * FROM z_php5_charset ORDER BY id
COLLATE Latin1_General_CP1_CI_AS
…返回一个错误,指出“表达式类型int对于COLLATE子句无效”。 更改为:
COLLATE SQL_Latin1_General_CP1_CI_AS
考虑到它可能与php.ini文件中的“ mbstring”设置有关,我们安装了该模块,并尝试了至少十几种涉及所有内容的“ English”和“ ISO-8859-1”变体的设置设置为“英语”,将所有设置为“ ISO-8859-1”,在每次组合测试之间重新启动Apache,尽管我看到一些组合的问号更少,但输出仍然是UTF-8。
问题似乎在于“ sqlsrv_fetch_array”函数将所有字符串数据强制为UTF-8,并且仅返回UTF-8。
Minimum code (PHP 7 version)
$db_con = @sqlsrv_connect($server, $connectionInfo);
$sql = "SELECT COLLATIONPROPERTY('SQL_Latin1_General_CP1_CI_AS','CodePage'); ".
// 'COLLATE Latin1_General_CP1_CI_AS '.
$sql = 'SELECT * FROM z_php5_charset ORDER BY id '.
// 'COLLATE Latin1_General_CP1_CI_AS '.
$result_id = sqlsrv_query($db_con, $sql, $parms, $options);
while ($row = sqlsrv_fetch_array($result_id, SQLSRV_FETCH_ASSOC)) {
$c = $row['id'];
$char = $row['charset'];
$outbuf = $c.': '.$char.' '.ord($char);
if (strlen($char) > 1) {
$outbuf .= ' --- more than one char returned ['.strlen($char).' chars returned]';
echo $outbuf."\n";
} // end while
Please note:
Column 1 is the ORDinal value of the character written to the VARCHAR field in the database by a PHP program.
Column 2 is the character returned when retrieved from the database.
Column 3 is the value returned by the PHP ORD() function.
column 4 (if any) is additional information about the character retrieved.
Sample of PHP 5 output:
160: † 160
161: ° 161
162: ¢ 162
163: £ 163
164: § 164
165: • 165
166: ¶ 166
167: ß 167
168: ® 168
169: © 169
170: ™ 170
171: ´ 171
172: ¨ 172
173: ≠ 173
174: Æ 174
175: Ø 175
176: ∞ 176
177: ± 177
178: ≤ 178
179: ≥ 179
Sample of PHP 7 output:
160: Â 194 --- more than one char returned [2 chars returned]
161: ¡ 194 --- more than one char returned [2 chars returned]
162: ¢ 194 --- more than one char returned [2 chars returned]
163: £ 194 --- more than one char returned [2 chars returned]
164: ¤ 194 --- more than one char returned [2 chars returned]
165: ¥ 194 --- more than one char returned [2 chars returned]
166: ¦ 194 --- more than one char returned [2 chars returned]
167: § 194 --- more than one char returned [2 chars returned]
168: ¨ 194 --- more than one char returned [2 chars returned]
169: © 194 --- more than one char returned [2 chars returned]
170: ª 194 --- more than one char returned [2 chars returned]
171: « 194 --- more than one char returned [2 chars returned]
172: ¬ 194 --- more than one char returned [2 chars returned]
173: Â 194 --- more than one char returned [2 chars returned]
174: ® 194 --- more than one char returned [2 chars returned]
175: ¯ 194 --- more than one char returned [2 chars returned]
176: ° 194 --- more than one char returned [2 chars returned]
177: ± 194 --- more than one char returned [2 chars returned]
178: ² 194 --- more than one char returned [2 chars returned]
179: ³ 194 --- more than one char returned [2 chars returned]