为什么doctest使用包含UTF-8字符的字符串失败?

时间:2016-08-08 08:00:14

标签: python html utf-8 doctest

我正在阅读包含此字符串的纯ASCII html文件(charset = utf-8):

  

< title>什么是新的?< / title>

这个字符串是不可用的,所以我想出了这个函数(在字符串被hexlify'd之后使用)。然后,我加入了一个docstring测试:

def to_be_replaced(reprString):
    """
    :reprString: a repr(string) -- won't work otherwise

    >>> s = "<title>What’s New?</title>"
    >>> r = repr(s)
    >>> print r
    <title>What\xe2\x80\x99s New?</title>
    >>> to_be_replaced(r)
    set(['\xe2\x80\x99'])
    """
    regex = re.compile('([\x7f-\xff]{2,})')
    return set(re.findall(regex, reprString))

不幸的是,测试失败了:

>"E:\Python27\pythonw.exe" -u "test_to_be_replaced.py"

to be replaced: set(['\xe2\x80\x99'])

**********************************************************************
File "test_to_be_replaced.py", line 14, in __main__.to_be_replaced
Failed example:
    print r
Expected:
    <title>What’s New?</title>
Got:
    '<title>What\xe2\x80\x99s New?</title>'
**********************************************************************
File "test_to_be_replaced.py", line 16, in __main__.to_be_replaced
Failed example:
    to_be_replaced(r)
Expected:
    set(['’'])
Got:
    set([])
**********************************************************************
1 items had failures:
   2 of   4 in __main__.to_be_replaced
***Test Failed*** 2 failures.
>Exit code: 0

上面的输出来自:

if __name__ == '__main__':
    s = '<title>What\xe2\x80\x99s New?</title>'
    print 'to be replaced:', to_be_replaced(s)  # works as intended
    import doctest
    doctest.testmod()

为了让测试通过,我该怎么办?

在Windows 7 x32上使用python 2.7.10。

0 个答案:

没有答案