我正在阅读包含此字符串的纯ASCII html文件(charset = utf-8):
< title>什么是新的?< / title>
这个字符串是不可用的,所以我想出了这个函数(在字符串被hexlify'd之后使用)。然后,我加入了一个docstring测试:
def to_be_replaced(reprString):
"""
:reprString: a repr(string) -- won't work otherwise
>>> s = "<title>What’s New?</title>"
>>> r = repr(s)
>>> print r
<title>What\xe2\x80\x99s New?</title>
>>> to_be_replaced(r)
set(['\xe2\x80\x99'])
"""
regex = re.compile('([\x7f-\xff]{2,})')
return set(re.findall(regex, reprString))
不幸的是,测试失败了:
>"E:\Python27\pythonw.exe" -u "test_to_be_replaced.py"
to be replaced: set(['\xe2\x80\x99'])
**********************************************************************
File "test_to_be_replaced.py", line 14, in __main__.to_be_replaced
Failed example:
print r
Expected:
<title>What’s New?</title>
Got:
'<title>What\xe2\x80\x99s New?</title>'
**********************************************************************
File "test_to_be_replaced.py", line 16, in __main__.to_be_replaced
Failed example:
to_be_replaced(r)
Expected:
set(['’'])
Got:
set([])
**********************************************************************
1 items had failures:
2 of 4 in __main__.to_be_replaced
***Test Failed*** 2 failures.
>Exit code: 0
上面的输出来自:
if __name__ == '__main__':
s = '<title>What\xe2\x80\x99s New?</title>'
print 'to be replaced:', to_be_replaced(s) # works as intended
import doctest
doctest.testmod()
为了让测试通过,我该怎么办?
在Windows 7 x32上使用python 2.7.10。