Question

来自PEP263：

要定义源代码编码，必须将魔术注释作为文件中的第一行或第二行放入源文件中，例如：

# coding=<encoding name>

或（使用流行编辑认可的格式）：

#!/usr/bin/python
# -*- coding: <encoding name> -*-

如果有许多情况下许可信息出现在最顶层的行，例如，来自https://github.com/google/seq2seq/blob/master/seq2seq/training/utils.py：

# Copyright 2017 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# -*- coding: utf-8 -*-
"""Miscellaneous training utility functions.
"""

编码定义是否仍然会被Python解释器“神奇地”接受？如果答案解释为什么必须在前两行中并指向解释器代码，那将会很棒太棒了！

Answer 1

是的，在Python 2中，UTF-8编码需要编码标记，如果它超出第二行，并且文件中有任何非ASCII字符，则会引发如下错误：

 if (System.currentTimeMillis() - mAskedPermissionTime < 100)

如果文件只包含ASCII字符，即使UTF-8编码标记晚于第2行，它仍然可以工作.ASCII是UTF-8的子集，基本上，后编码指令被忽略。（这似乎是您引用的特定File "encoded.py", line 5 SyntaxError: Non-ASCII character '\xe1' in file encoded.py on line 5, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details的情况。）

许多解析器和其他文件处理器需要这些魔术命令位于文件的开头，因为必须对其进行扫描并将其考虑在内才能正确解释文件。把它们放在后面，它效率低下，需要扫描整个文件才能找到一些“神奇”的特殊情况。

您将在Python 3中获得一些余地，它假设采用UTF-8编码。虽然如果您的文件以其他方式编码，您仍然希望包含它。

Answer 2

规范允许第一个两个行允许在unix系统上使用shebang #!...。

不，在第二行之后不允许。

以下是来自cpython的tokenizer的代码，它检查（并解析）编码cookie：https://github.com/python/cpython/blob/9e52c907b5511393ab7e44321e9521fe0967e34d/Parser/tokenizer.c#L613-L616

编码定义必须在Python的第1行/第2行吗？

2 个答案: