I have read in an XML email attachment with
bytes_string=part.get_payload(decode=False)
The payload comes in as a byte string, as my variable name suggests.
I am trying to use the recommended Python 3 approach to turn this string into a usable string that I can manipulate.
The example shows:
str(b'abc','utf-8')
How can I apply the b
(bytes) keyword argument to my variable bytes_string
and use the recommended approach?
The way I tried doesn't work:
str(bbytes_string, 'utf-8')
答案 0 :(得分:145)
你在最后一行几乎是正确的。你想要
str(bytes_string, 'utf-8')
因为bytes_string
的类型为bytes
,与b'abc'
的类型相同。
答案 1 :(得分:39)
Call decode()
on a bytes
instance to get the text which it encodes.
str = bytes.decode()
答案 2 :(得分:4)
<强>更新:强>
首先没有任何
b
和引号
由于您的代码可能包含{strong>无法识别的字符,因此'utf-8'
编码,
不使用任何其他参数,最好只使用str:
bad_bytes = b'\x02-\xdfI#)'
text = str( bad_bytes )[2:-1]
如果将'utf-8'
参数添加到这些特定字节,则应收到错误。
正如PYTHON 3标准所说,text
现在可以使用utf-8而不用担心。
答案 3 :(得分:3)
如何从数组中过滤(跳过)非UTF8字符?
要在@ uname01的帖子和OP中解决此评论,请忽略错误:
<强>代码强>
>>> b'\x80abc'.decode("utf-8", errors="ignore")
'abc'
<强>详情
从docs开始,以下是使用相同errors
参数的更多示例:
>>> b'\x80abc'.decode("utf-8", "replace")
'\ufffdabc'
>>> b'\x80abc'.decode("utf-8", "backslashreplace")
'\\x80abc'
>>> b'\x80abc'.decode("utf-8", "strict")
Traceback (most recent call last):
...
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0:
invalid start byte
errors参数指定无法根据编码规则转换输入字符串时的响应。此参数的合法值为
'strict'
(提出UnicodeDecodeError
例外),'replace'
(使用U+FFFD
,REPLACEMENT CHARACTER
)或'ignore'
(仅将字符留在Unicode结果之外。)