Question

当我在非ASCII文件中使用Python中的exec open("tx.py")时，我收到如下错误：

SyntaxError: Non-ASCII character '\xc3' in file tx.py on line 1, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

然而，当我compile(open("tx.py").read(), "tx.py", "exec")时，我没有遇到这样的错误，Python 2.7很乐意编译该文件。我怎样才能从编译（...）获得同样的SyntaxError？

请注意，我的目标不是修复SyntaxError，而是让编译（...）的行为与 exec 的行为相同

Answer 1

查看源代码，使用exec最终会调用PyTokenizer_FromFile，而PyTokenizer_FromString则会用于compile()。这些在设置标记器的方式上有所不同。

使用PyTokenizer_FromFile标记生成器以空缓冲区开始，并调用fp_readl函数来填充标记生成器缓冲区，这可以触发您看到的异常（如果标记生成器看到没有看到编码声明和非ASCII字符）。然后将文件内容重新编码为UTF8并由标记器处理，以便于标记化。后来，令牌被重新编码为原始编解码器。

当使用PyTokenizer_FromString时，缓冲区被设置为传入的字符串。检查字符串是否有BOM和符合PEP 263的注释，就像文件正在读取时一样，但如果没有这样的话。如果设置了编解码器，则该字符串将由标记生成器按原样处理，并且不会进行重新编码。在这种情况下，tokenizer的encoding字段保留为空，就像对于ASCII文件一样。在初始化缓冲区且没有文件对象的情况下，不会调用fp_readl，也不会引发异常。

由于存在这些差异，无法迫使compile()的行为与exec完全相同。您必须手动执行相同的测试：

检查第一个字节中的BOM;针对codecs.BOM_*常量进行测试
检查前两行中的coding评论。
如果缺少这些，请尝试从ASCII解码并在解码失败时手动抛出SyntaxError异常。

import codecs
import re
_boms = (codecs.BOM_UTF8,) + tuple(v for k, v in vars(codecs).iteritems() if k.startswith('BOM_') and k[-3:] in ('_LE', '_BE'))
_coding_line = re.compile('\s*#\s*coding[:=]\s*[-\w.]+').match

def compile_precheck(string):
    if string.startswith(_boms):
        return
    for line in string.splitlines()[:2]:
        if _coding_line(line)
            return
    try:
        string.decode('ascii')
    except UnicodeDecodeError:
        raise SyntaxError(
            "Non-ASCII character in source string but no encoding declared")

source = open("tx.py").read()
compile_precheck(source)
tx = compile(source, "tx.py", "exec")

Answer 2

这一行：

tx = compile(open("tx.py").read().decode('ascii'), "tx.py", "exec")

或者：

import codecs
tx = compile(codecs.open("tx.py", encoding='ascii').read(), "tx.py", "exec")

我收到了这个错误：

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 18: ordinal not   in range(128)

这是我越接近你的SynthaxError。

编辑：您可以编写自己的自定义编译并格式化预期的错误：

def custom_compile(source, *args, **kwargs):
    try:
        return compile(source.decode('ascii'), *args, **kwargs)
    except UnicodeDecodeError as error:
        raise SyntaxError(error)

tx = custom_compile(open("tx.py").read(), "tx.py", "exec")

错误：

SyntaxError: 'ascii' codec can't decode byte 0xe9 in position 17: ordinal not in range(128)

如何从编译中获取编码错误（...）

2 个答案: