Question

我正面临着正则表达式和西里尔符号的问题。我只是尝试读取文件，使用preg_match并在哑函数中显示其内容，如下所示：

...
$regex = '/"(.*)"/im';
$content = file_get_contents($file->getRealPath());
$filename = $file->getClientOriginalName();

preg_match_all($regex, $content, $matches);

return var_dump($matches[0]);

示例输出： 35 = B 04＆lt; 8 = 8AB @ 8 @＆gt; 20 = 8O卡巴斯基安全中心10 文件中的字符串：АгентадминистрированияKasperskySecurity Center 10

我尝试过所有可能来自不同编码的转换，使用类似

的功能

private function file_get_contents_utf8($fn) {
         $content = file_get_contents($fn);
          return mb_convert_encoding($content, 'UTF-8',
              mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true));
    }

使用iconv，mb_convert_encoding等方法，并在打开文件时将文本转换为UTF-8，似乎没有任何效果。任何建议可能是什么问题？

Answer 1

问题解决了，基本上用函数检查后文件本身的编码为iso8859-2，但真正的编码是UTS-2。

php preg_match和iso-8859-1中的西里尔字母

1 个答案: