我必须读取一个文件并识别其解码类型,我使用mb_detect_encoding()
来检测utf-16
但是结果出错了..如何在php中检测utf-16
编码类型。
Php文件是utf-16,我的标题是windows-1256(因为阿拉伯语)
header('Content-Type: text/html; charset=windows-1256');
$delimiter = '\t';
$f= file("$fileName");
foreach($f as $dailystatmet)
{
$transactionData = str_replace("'", '', $dailystatmet);
preg_match_all("/('?\d+,\d+\.\d+)?([a-zA-Z]|[0-9]|)[^".$delimiter."]+/",$transactionData,$matches);
array_push($matchesz, $matches[0]);
}
$searchKeywords = array ("apple", "orange", 'mango');
$rowCount = count($matchesz);
for ($row = 1; $row <= $rowCount; $row++) {
$myRow = $row;
$cell = $matchesz[$row];
foreach ($searchKeywords as $val) {
if (partialArraySearch($cell[$c_description], $val)) {
}
}}
function partialArraySearch($cell, $searchword)
{
if (strpos(strtoupper($cell), strtoupper($searchword)) !== false) {
return true;
}
return false;
}
上面的代码用于在上传的文件中进行搜索..如果文件是在utf-8中,则匹配正在获得但是当使用utf-16或utf-32的相同文件时没有得到结果..
那么如何才能获得上传文件的编码类型..
答案 0 :(得分:1)
如果有人还在寻找解决方案,我在github上的“voku / portable-utf8”repo中破解了类似的东西。 =&GT; “UTF8 ::的file_get_contents()”
“file_get_contents”-wrapper将通过“UTF8 :: str_detect_encoding()”检测当前编码,并将文件内容自动转换为UTF-8。
例如:来自PHPUnit测试...
$testString = UTF8::file_get_contents(dirname(__FILE__) . '/test1Utf16pe.txt');
$this->assertContains('<p>Today’s Internet users are not the same users who were online a decade ago. There are better connections.', $testString);
$testString = UTF8::file_get_contents(dirname(__FILE__) . '/test1Utf16le.txt');
$this->assertContains('<p>Today’s Internet users are not the same users who were online a decade ago. There are better connections.', $testString);
答案 1 :(得分:1)
我的解决方案是检测UTF-16并转换拉丁语15中的代码
preg_match_all('/\x00/',$content,$count);
if(count($count[0])/strlen($content)>0.4) {
$content = iconv('UTF-16', 'ISO-8859-15', $content);
}
换句话说,我检查十六进制字符00的频率。如果它高于0.4,则文本可能包含以UTF-16编码的基本集中的字符。这意味着字符有两个字节,但通常第二个字节是00。