Question

plaintext = input("Please enter the text you want to compress")
filename = input("Please enter the desired filename")
with gzip.open(filename + ".gz", "wb") as outfile:
    outfile.write(plaintext)

上面的python代码给出了以下错误：

Traceback (most recent call last):
  File "C:/Users/Ankur Gupta/Desktop/Python_works/gzip_work1.py", line 33, in <module>
    compress_string()
  File "C:/Users/Ankur Gupta/Desktop/Python_works/gzip_work1.py", line 15, in compress_string
    outfile.write(plaintext)
  File "C:\Python32\lib\gzip.py", line 312, in write
    self.crc = zlib.crc32(data, self.crc) & 0xffffffff
TypeError: 'str' does not support the buffer interface

Answer 1

如果使用Python3x，那么string与Python 2.x的类型不同，则必须将其转换为字节（对其进行编码）。

plaintext = input("Please enter the text you want to compress")
filename = input("Please enter the desired filename")
with gzip.open(filename + ".gz", "wb") as outfile:
    outfile.write(bytes(plaintext, 'UTF-8'))

也不要使用string或file之类的变量名称，而这些名称是模块或函数。

编辑@Tom

是的，非ASCII文本也被压缩/解压缩。我使用UTF-8编码的波兰语字母：

plaintext = 'Polish text: ąćęłńóśźżĄĆĘŁŃÓŚŹŻ'
filename = 'foo.gz'
with gzip.open(filename, 'wb') as outfile:
    outfile.write(bytes(plaintext, 'UTF-8'))
with gzip.open(filename, 'r') as infile:
    outfile_content = infile.read().decode('UTF-8')
print(outfile_content)

Answer 2

这个问题有一个更简单的解决方案。

您只需在模式中添加t即可成为wt。这会导致Python将文件作为文本文件而不是二进制文件打开。然后一切都会起作用。

完整的程序变为：

plaintext = input("Please enter the text you want to compress")
filename = input("Please enter the desired filename")
with gzip.open(filename + ".gz", "wt") as outfile:
    outfile.write(plaintext)

Answer 3

如果没有显式转换为某些编码，则无法将Python 3'字符串'序列化为字节。

outfile.write(plaintext.encode('utf-8'))

可能是你想要的。这也适用于python 2.x和3.x。

Answer 4

对于Python 3.x，您可以通过以下方式将文本转换为原始字节：

bytes("my data", "encoding")

例如：

bytes("attack at dawn", "utf-8")

返回的对象将与outfile.write一起使用。

Answer 5

从py2切换到py3时，通常会发生此问题。在py2中plaintext是字符串和字节数组类型。在py3 plaintext中只有一个字符串，当outfile.write()以二进制模式打开时，方法outfile实际上需要一个字节数组 ，所以引发了一个例外。将输入更改为plaintext.encode('utf-8')以解决问题。如果这困扰你，请继续阅读。

在py2中，declaration for file.write使你看起来像是传递了一个字符串：file.write(str)。实际上你传入一个字节数组，你应该一直在阅读这样的声明：file.write(bytes)。如果你这样读它的问题很简单，file.write(bytes)需要一个字节类型，而在py3中要从 str 中获取 bytes >你转换它：

py3>> outfile.write(plaintext.encode('utf-8'))

为什么py2文档声明file.write采用字符串？在py2中，声明的区别并不重要，因为：

py2>> str==bytes         #str and bytes aliased a single hybrid class in py2
True

py2的 str-bytes 类具有方法/构造函数，使其在某些方面表现得像字符串类，在其他方面表现为字节数组类。方便file.write不是吗？：

py2>> plaintext='my string literal'
py2>> type(plaintext)
str                              #is it a string or is it a byte array? it's both!

py2>> outfile.write(plaintext)   #can use plaintext as a byte array

为什么py3打破了这个不错的系统？好吧因为在py2中，基本的字符串函数对世界其他地方都没有用。用非ASCII字符测量单词的长度？

py2>> len('¡no')        #length of string=3, length of UTF-8 byte array=4, since with variable len encoding the non-ASCII chars = 2-6 bytes
4                       #always gives bytes.len not str.len

这段时间你以为你在py2中要求输入字符串的 len ，你从编码中得到了字节数组的长度。这种含糊不清是双重课程的根本问题。你实现了哪个方法调用版本？

然后好消息是py3解决了这个问题。它解开了 str 和 bytes 类。 str 类具有类似字符串的方法，单独的 bytes 类具有字节数组方法：

py3>> len('¡ok')       #string
3
py3>> len('¡ok'.encode('utf-8'))     #bytes
4

希望知道这有助于解决问题，并使迁移的痛苦更容易承受。

Answer 6

>>> s = bytes("s","utf-8")
>>> print(s)
b's'
>>> s = s.decode("utf-8")
>>> print(s)
s

如果删除恼人的＆＃39; b＆＃39;如果有人有更好的主意，请建议我或随时在这里编辑我。我只是新手

Answer 7

对于Django单元测试中的django.test.TestCase，我更改了 Python2 语法：

def test_view(self):
    response = self.client.get(reverse('myview'))
    self.assertIn(str(self.obj.id), response.content)
    ...

使用 Python3 .decode('utf8')语法：

def test_view(self):
    response = self.client.get(reverse('myview'))
    self.assertIn(str(self.obj.id), response.content.decode('utf8'))
    ...

TypeError：'str'不支持缓冲区接口

7 个答案: