Question

为什么打印这些\ x值在不同的操作系统和版本中会给出不同的值？示例：

print("A"*20+"\xef\xbe\xad\xde")

这在Python3和2和不同平台中提供了不同的输出

在Microsoft Windows中：

Python2：AAAAAAAAAAAAAAAAAAAAï¾Þ

Python3：AAAAAAAAAAAAAAAAAAAAï¾Þ

在卡利市：

Python2：AAAAAAAAAAAAAAAAAAAAﾭ

Python3：AAAAAAAAAAAAAAAAAAAAï¾Þ

更新：我想要的是确切的Python2输出，但带有Python3？我尝试了很多事情（编码，解码，字节转换），但意识到\ xde无法解码。还有其他方法可以实现我想要的吗？

Answer 1

这是编码问题。

在Latin1或Windows 1252编码中，您具有：

MyConstructor myConstructor  = new MyConstructor(2,3,5,"This is my cons");
//or
myConstructor.int1 = 2;
myConstructor.int2 = 3;
myConstructor.int3 = 5;
myConstructor.string1 = "This is my cons";

在utf-8编码中，您具有：

0xef -> ï (LATIN SMALL LETTER I WITH DIAERESIS) 0xbe -> ¾ (VULGAR FRACTION THREE QUARTERS) 0xad -> undefined and non printed in your examples 0xde -> Þ (LATIN CAPITAL LETTER THORN)-> '\xef\xbe\xad'或u'\uffad'（半角语言字母RIEUL-SIOS） 'ﾭ'->应该引发UnicodeDecodeError ...

在Windows中，Python2或Python3都使用Windows 1252代码页（在您的示例中）。在Kali上，Python2将字符串视为字节字符串，并且终端在utf8中显示该字符串，而Python3则假定它已经包含unicode字符值并直接显示它们。

就像在Latin1中一样（在Windows 1252中，对于0x80-0x9f之外的所有字符），字节码是unicode值，足以解释您的输出。

学习内容：明确字符串是否包含unicode或字节，并提防编码！

Answer 2

要在Python 2和Python 3上获得一致的行为，您需要明确说明预期的输出。如果需要AAAAAAAAAAAAAAAAAAAAﾭ，则\xde是垃圾；如果您想要AAAAAAAAAAAAAAAAAAAAï¾Þ，则\xad是垃圾。无论哪种方式，打印所拥有内容的“解决方案”都是显式使用bytes文字，并以所需的编码decode使用它们，而忽略错误。因此，要获得AAAAAAAAAAAAAAAAAAAAﾭ（解释为UTF-8），您可以这样做：

print((b"A"*20+b"\xef\xbe\xad\xde").decode('utf-8', errors='ignore'))

在获得AAAAAAAAAAAAAAAAAAAAï¾Þ的同时，您将要做：

# cp1252 can be used instead of latin-1, depending on intent; they overlap in this case
print((b"A"*20+b"\xef\xbe\xad\xde").decode('latin-1', errors='ignore'))

重要的是，注意文字上的前导b；它们在Python 2.7上被识别和忽略（除非from __future__ unicode_literals生效，在这种情况下，就像在Python 3中一样需要它们），在Python 3上，它们使文字bytes文字（没有假定使用特殊编码），而不是str文字，因此您可以使用所需的编码进行解码。无论哪种方式，您最终都会得到原始字节，然后可以用首选编码对其进行解码，而忽略错误。

请注意，忽略错误通常是错误的。您将数据放在地板上。 0xDEADBEEF不能保证以任何给定的编码产生有用的字节字符串，如果那不是您的真实数据，那么您可能会想通过静默忽略不可解码的数据来冒着出错的风险。

如果您想编写原始字节，让任何消耗stdout的对象按自己的意愿进行解释，则需要降至print级别以下，因为Python 3上的print是完全基于str。要在Python 3上写原始字节，您可以使用sys.stdout.buffer（sys.stdout是基于文本的，sys.stdout.buffer是它包装的底层缓冲的面向字节的流）；您还需要手动添加换行符（如果需要）：

sys.stdout.buffer.write(b"A"*20+b"\xef\xbe\xad\xde\n")

vs。在Python 2上，stdout不是编码包装器：

sys.stdout.write(b"A"*20+b"\xef\xbe\xad\xde\n")

对于可移植代码，您可以提前获取“原始标准输出”并使用它：

# Put this at the top of your file so you don't have to constantly recheck/reacquire
# Gets sys.stdout.buffer if it exists, sys.stdout otherwise
bstdout = getattr(sys.stdout, 'buffer', sys.stdout)

# Works on both Py2 and Py3
bstdout.write(b"A"*20+b"\xef\xbe\xad\xde\n")

为什么在不同的操作系统和版本中打印这些值会给出不同的值？

2 个答案: