Question

I have read in an XML email attachment with

bytes_string=part.get_payload(decode=False)

The payload comes in as a byte string, as my variable name suggests.

I am trying to use the recommended Python 3 approach to turn this string into a usable string that I can manipulate.

The example shows:

str(b'abc','utf-8')

How can I apply the b (bytes) keyword argument to my variable bytes_string and use the recommended approach?

The way I tried doesn't work:

str(bbytes_string, 'utf-8')

Answer 1

你在最后一行几乎是正确的。你想要

str(bytes_string, 'utf-8')

因为bytes_string的类型为bytes，与b'abc'的类型相同。

Answer 2

Call decode() on a bytes instance to get the text which it encodes.

str = bytes.decode()

Answer 3

<强>更新：

首先没有任何b和引号

由于您的代码可能包含{strong>无法识别的字符，因此'utf-8'编码，不使用任何其他参数，最好只使用str：

bad_bytes = b'\x02-\xdfI#)'
text = str( bad_bytes )[2:-1]

如果将'utf-8'参数添加到这些特定字节，则应收到错误。

正如PYTHON 3标准所说，text现在可以使用utf-8而不用担心。

Answer 4

如何从数组中过滤（跳过）非UTF8字符？

要在@ uname01的帖子和OP中解决此评论，请忽略错误：

<强>代码

>>> b'\x80abc'.decode("utf-8", errors="ignore")
'abc'

<强>详情

从docs开始，以下是使用相同errors参数的更多示例：

>>> b'\x80abc'.decode("utf-8", "replace")
'\ufffdabc'
>>> b'\x80abc'.decode("utf-8", "backslashreplace")
'\\x80abc'
>>> b'\x80abc'.decode("utf-8", "strict")  
Traceback (most recent call last):
    ...
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0:
  invalid start byte

errors参数指定无法根据编码规则转换输入字符串时的响应。此参数的合法值为'strict'（提出UnicodeDecodeError例外），'replace'（使用U+FFFD，REPLACEMENT CHARACTER）或'ignore'（仅将字符留在Unicode结果之外。）

How do I convert a Python 3 byte-string variable into a regular string?

4 个答案: