Question

我的所有python源代码都是用utf-8编码的，并且在文件的顶部声明了这个编码。

但有时缺少unicode字符串之前的u。

示例Umlauts = "üöä"

上面是一个包含非ascii字符的字节字符串，这会产生麻烦（UnicodeDecodeError）。

我尝试了pylint和python -3，但我无法收到警告。

我搜索自动搜索字符串中的非ascii字符。

我的源代码需要支持Python 2.6和Python 2.7。

我知道这个众所周知的错误：

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 7: ordinal not in range(128)

BTW：这个问题只是关于python源代码，而不是关于从文件或套接字读取的字符串。

解决方案

对于需要支持Python 2.6+的项目，我将使用__future__.unicode_literals
对于需要支持2.5的项目，我将使用thg435（模块ast）的解决方案

Answer 1

当然你想使用python！

import ast, re

with open("your_script.py") as fp:
    tree = ast.parse(fp.read())

for node in ast.walk(tree):
    if (isinstance(node, ast.Str) 
            and isinstance(node.s, str) 
            and  re.search(r'[\x80-\xFF]', node.s)):
        print 'bad string %r line %d col %d' % (node.s, node.lineno, node.col_offset)

请注意，这并不区分裸和非转义的非ascii字符（fuß和fu\xdf）。

在python源代码中找到非ascii bytestrings

1 个答案: