PHP 7 sqlsrv_fetch_array函数以UTF形式返回Latin1 VARCHAR字符串,而不是必需的Latin1

时间:2019-05-17 19:34:07

标签: php sql-server

“ sqlsrv_fetch_array”函数返回的字符串为UTF-8,而不是Latin1。

存储在MS SQL SERVER中的字符串数据使用Latin1。当数据库工具MS VS,AquaDataStudio等显示VARCHAR数据时,它将显示为Latin1,这是数据库中的数据类型。现有的PHP 5代码(使用FreeTDS)也可以正确显示数据。这很重要,因为某些字段包含以Base128编码方案存储的二进制数据,该方案使用十进制范围为128-255的单字节打印字符,并使用PHP ORD函数将其解码为VARCHAR字符串。

现在我们必须转到PHP 7,并且正在使用Microsoft驱动程序(以“ sqlsrv”开头的函数,例如“ sqlsrv_connect”,“ sqlsrv_query”,“ sqlsrv_fetch_array”等),它们不会返回字符串(VARCHAR)为拉丁文1,但为多字节UTF-8,当然这有两个含义: 1)返回的字符串比原始字符串长得多,因此验证检查失败和/或字符串被截断,并且 2)PHP ORD函数仅返回0-127十进制范围内字符的正确值,因此解码/解密代码失败。

我在PHP 5的本地数据库中创建了一个表,该表包含1-255范围内每个字符的记录。当我读回PHP 5中的记录并显示字符及其“ ORD()”值时,我看到了正确的结果。 PHP 7版本的代码正确显示了1-127范围内的字符,但128-255却显示为问号,带有错误ORD()值的一字节字符或两个或三个字节的字符串,其中“ ORD”返回的值不正确。

问题似乎在于“ sqlsrv_fetch_array”函数将所有字符串数据强制为UTF-8,并且仅返回UTF-8。

更改为UTF-8并不是一种选择,因为数据不是我们的数据,而且我们只有对数据库的读取权限。

此外,该应用程序使用了超过五百万行的PHP 5代码,因此尝试对每个查询返回的每个字段进行遍历并使用iconv()是不切实际的。

如何从实际存储在Latin1中的数据库中获取PHP 7中返回的Latin1数据?

在使用Microsoft“ sqlsrv”函数的PHP 7中,到目前为止我没有尝试过。我尝试过的一些事情是:

    SELECT COLLATIONPROPERTY('SQL_Latin1_General_CP1_CI_AS','CodePage')

...无效,可能是因为这是数据库的默认值。

    SELECT * FROM z_php5_charset ORDER BY id
    COLLATE Latin1_General_CP1_CI_AS

…返回一个错误,指出“表达式类型int对于COLLATE子句无效”。 更改为:

    COLLATE SQL_Latin1_General_CP1_CI_AS

…返回了相同的错误。

考虑到它可能与php.ini文件中的“ mbstring”设置有关,我们安装了该模块,并尝试了至少十几种涉及所有内容的“ English”和“ ISO-8859-1”变体的设置设置为“英语”,将所有设置为“ ISO-8859-1”,在每次组合测试之间重新启动Apache,尽管我看到一些组合的问号更少,但输出仍然是UTF-8。

    mb_internal_encoding('ISO-8859-1');

...也没有效果。

问题似乎在于“ sqlsrv_fetch_array”函数将所有字符串数据强制为UTF-8,并且仅返回UTF-8。

Minimum code (PHP 7 version)
…
    mb_internal_encoding('ISO-8859-1');
    $db_con = @sqlsrv_connect($server, $connectionInfo);
    $sql = "SELECT COLLATIONPROPERTY('SQL_Latin1_General_CP1_CI_AS','CodePage'); ".
       // 'COLLATE Latin1_General_CP1_CI_AS '.
       "\n";
    $sql = 'SELECT * FROM z_php5_charset ORDER BY id '.
       // 'COLLATE Latin1_General_CP1_CI_AS '.
       "\n";
    $result_id = sqlsrv_query($db_con, $sql, $parms, $options);
    while ($row = sqlsrv_fetch_array($result_id, SQLSRV_FETCH_ASSOC)) {
       $c = $row['id'];
       $char = $row['charset'];
       $outbuf = $c.': '.$char.'  '.ord($char);
       if (strlen($char) > 1) {
          $outbuf .= ' --- more than one char returned ['.strlen($char).' chars returned]';
       }
       echo $outbuf."\n";
    }  // end while
Please note:  

Column 1 is the ORDinal value of the character written to the VARCHAR field in the database by a PHP program.

Column 2 is the character returned when retrieved from the database.

Column 3 is the value returned by the PHP ORD() function.

column 4 (if any) is additional information about the character retrieved.

    Sample of PHP 5 output:
    160: †  160
    161: °  161
    162: ¢  162
    163: £  163
    164: §  164
    165: •  165
    166: ¶  166
    167: ß  167
    168: ®  168
    169: ©  169
    170: ™  170
    171: ´  171
    172: ¨  172
    173: ≠  173
    174: Æ  174
    175: Ø  175
    176: ∞  176
    177: ±  177
    178: ≤  178
    179: ≥  179


    Sample of PHP 7 output:
    160: Â   194 --- more than one char returned [2 chars returned]
    161: ¡  194 --- more than one char returned [2 chars returned]
    162: ¢  194 --- more than one char returned [2 chars returned]
    163: £  194 --- more than one char returned [2 chars returned]
    164: ¤  194 --- more than one char returned [2 chars returned]
    165: ¥  194 --- more than one char returned [2 chars returned]
    166: ¦  194 --- more than one char returned [2 chars returned]
    167: §  194 --- more than one char returned [2 chars returned]
    168: ¨  194 --- more than one char returned [2 chars returned]
    169: ©  194 --- more than one char returned [2 chars returned]
    170: ª  194 --- more than one char returned [2 chars returned]
    171: «  194 --- more than one char returned [2 chars returned]
    172: ¬  194 --- more than one char returned [2 chars returned]
    173: ­  194 --- more than one char returned [2 chars returned]
    174: ®  194 --- more than one char returned [2 chars returned]
    175: ¯  194 --- more than one char returned [2 chars returned]
    176: °  194 --- more than one char returned [2 chars returned]
    177: ±  194 --- more than one char returned [2 chars returned]
    178: ²  194 --- more than one char returned [2 chars returned]
    179: ³  194 --- more than one char returned [2 chars returned]

0 个答案:

没有答案