Question

好吧 - 我使用file_get_contents()来接收txt文件的内容。我检查了这一行：$encoding=mb_detect_encoding($texts, 'auto');编码它是什么，输出是ascii。

通常ascii不知何故是utf-8但是我用这个有趣的符号代替长划线：

€“

使用$texts=iconv('ASCII', 'UTF-8//IGNORE', $texts);我可以删除这些符号，但我想保留它们。

我也尝试用普通破折号替换它们：

$texts=str_replace('â€“', '-', $texts);

但这不起作用。也许还有其他奇怪的符号 - 我怎么能正确编码它们或用类似的符号替换它们？

Answer 1

这个有用的forceutf8库可以处理和纠正混合编码的字符串：

PHP类编码，具有流行的Encoding :: toUTF8（）函数 - 通常称为forceUTF8（） - 修复混合编码字符串。

Answer 2

最后在this site中找到了这个函数的答案。

我尝试了很多不同的东西，但最终这是唯一有用的东西。

Answer 3

该功能，由Alan Whipple提供，如果网站消失，用户2718671引用如下：

    function cleanEncoding( $text, $type='standard' ){
    // determine the encoding before we touch it
    $encoding = mb_detect_encoding($text, 'UTF-8, ISO-8859-1');
    // The characters to output
    if ( $type=='standard' ){
        $outp_chr = array('...',          "'",            "'",            '"',            '"',            'â¢',            '-',            '-'); // run of the mill standard characters
    } elseif ( $type=='reference' ) {
        $outp_chr = array('&#8230;',      '&#8216;',      '&#8217;',      '&#8220;',      '&#8221;',      '&#8226;',      '&#8211;',      '&#8212;'); // decimal numerical character references
    }
    // The characters to replace (purposely indented for comparison)
        $utf8_chr = array("\xe2\x80\xa6", "\xe2\x80\x98", "\xe2\x80\x99", "\xe2\x80\x9c", "\xe2\x80\x9d", '\xe2\x80\xa2', "\xe2\x80\x93", "\xe2\x80\x94"); // UTF-8 hex characters
        $winc_chr = array(chr(133),       chr(145),       chr(146),       chr(147),       chr(148),       chr(149),       chr(150),       chr(151)); // ASCII characters (found in Windows-1252)
    // First, replace UTF-8 characters.
    $text = str_replace( $utf8_chr, $outp_chr, $text);
    // Next, replace Windows-1252 characters.
    $text = str_replace( $winc_chr, $outp_chr, $text);
    // even if the string seems to be UTF-8, we can't trust it, so convert it to UTF-8 anyway
    $text = mb_convert_encoding($text, 'UTF-8', $encoding);
    return $text;
    }

如何用PHP正确编码ascii文本？

3 个答案: