我可以用于单元测试的一些有效和无效的UTF-8字符串是什么?

时间:2012-07-20 21:09:17

标签: php character-encoding

我在PHP中编写了两个函数,str_to_utf8()seems_utf8()(它们是由我从其他代码借来的部分组成的)。现在我正在为他们编写单元测试,我想确保我有适当的单元测试。我目前拿走了Facebook上的那些:

public function test_str_to_utf8()
{
    // Make sure ASCII characters are ignored
    $this->assertEquals( "this\x01 is a \x7f test string", str_to_utf8( "this\x01 is a \x7f test string" ) );

    // Make sure UTF8 characters are ignored
    $this->assertEquals( "\xc3\x9c \xc3\xbc \xe6\x9d\xb1!", str_to_utf8( "\xc3\x9c \xc3\xbc \xe6\x9d\xb1!" ) );

    // Test long strings
    #str_to_utf8( str_repeat( 'x', 1024 * 1024 ) );
    $this->assertEquals( TRUE, TRUE );

    // Test some invalid UTF8 to see if it is properly fixed
    $input = "\xc3 this has \xe6\x9d some invalid utf8 \xe6";
    $expect = "\xEF\xBF\xBD this has \xEF\xBF\xBD\xEF\xBF\xBD some invalid utf8 \xEF\xBF\xBD";
    $this->assertEquals( $expect, str_to_utf8( $input ) );
}

那些有效的测试用例吗?

1 个答案:

答案 0 :(得分:1)

我在测试UTF-8时发现this resource很有用。

如果你使用任何非latin-1文本,你需要确保你的PHP文件保存为UTF-8,或者预先转义它们