Question

我正在阅读包含此字符串的纯ASCII html文件（charset = utf-8）：

＆lt; title＆gt;什么是新的？＆lt; / title＆gt;

这个字符串是不可用的，所以我想出了这个函数（在字符串被hexlify'd之后使用）。然后，我加入了一个docstring测试：

def to_be_replaced(reprString):
    """
    :reprString: a repr(string) -- won't work otherwise

    >>> s = "<title>Whatâ€™s New?</title>"
    >>> r = repr(s)
    >>> print r
    <title>What\xe2\x80\x99s New?</title>
    >>> to_be_replaced(r)
    set(['\xe2\x80\x99'])
    """
    regex = re.compile('([\x7f-\xff]{2,})')
    return set(re.findall(regex, reprString))

不幸的是，测试失败了：

>"E:\Python27\pythonw.exe" -u "test_to_be_replaced.py"

to be replaced: set(['\xe2\x80\x99'])

**********************************************************************
File "test_to_be_replaced.py", line 14, in __main__.to_be_replaced
Failed example:
    print r
Expected:
    <title>Whatâ€™s New?</title>
Got:
    '<title>What\xe2\x80\x99s New?</title>'
**********************************************************************
File "test_to_be_replaced.py", line 16, in __main__.to_be_replaced
Failed example:
    to_be_replaced(r)
Expected:
    set(['â€™'])
Got:
    set([])
**********************************************************************
1 items had failures:
   2 of   4 in __main__.to_be_replaced
***Test Failed*** 2 failures.
>Exit code: 0

上面的输出来自：

if __name__ == '__main__':
    s = '<title>What\xe2\x80\x99s New?</title>'
    print 'to be replaced:', to_be_replaced(s)  # works as intended
    import doctest
    doctest.testmod()

为了让测试通过，我该怎么办？

在Windows 7 x32上使用python 2.7.10。

为什么doctest使用包含UTF-8字符的字符串失败？

0 个答案: