How do I convert a Python 3 byte-string variable into a regular string?

时间:2015-06-25 18:29:15

标签: string python-3.x type-conversion bytestring

I have read in an XML email attachment with

bytes_string=part.get_payload(decode=False)

The payload comes in as a byte string, as my variable name suggests.

I am trying to use the recommended Python 3 approach to turn this string into a usable string that I can manipulate.

The example shows:

str(b'abc','utf-8')

How can I apply the b (bytes) keyword argument to my variable bytes_string and use the recommended approach?

The way I tried doesn't work:

str(bbytes_string, 'utf-8')

4 个答案:

答案 0 :(得分:145)

你在最后一行几乎是正确的。你想要

str(bytes_string, 'utf-8')

因为bytes_string的类型为bytes,与b'abc'的类型相同。

答案 1 :(得分:39)

Call decode() on a bytes instance to get the text which it encodes.

str = bytes.decode()

答案 2 :(得分:4)

<强>更新:

  

首先没有任何b和引号

由于您的代码可能包含{strong>无法识别的字符,因此'utf-8'编码, 不使用任何其他参数,最好只使用str:

bad_bytes = b'\x02-\xdfI#)'
text = str( bad_bytes )[2:-1]

如果将'utf-8'参数添加到这些特定字节,则应收到错误。

正如PYTHON 3标准所说,text现在可以使用utf-8而不用担心。

答案 3 :(得分:3)

  

如何从数组中过滤(跳过)非UTF8字符?

要在@ uname01的帖子和OP中解决此评论,请忽略错误:

<强>代码

>>> b'\x80abc'.decode("utf-8", errors="ignore")
'abc'

<强>详情

docs开始,以下是使用相同errors参数的更多示例:

>>> b'\x80abc'.decode("utf-8", "replace")
'\ufffdabc'
>>> b'\x80abc'.decode("utf-8", "backslashreplace")
'\\x80abc'
>>> b'\x80abc'.decode("utf-8", "strict")  
Traceback (most recent call last):
    ...
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0:
  invalid start byte
  

errors参数指定无法根据编码规则转换输入字符串时的响应。此参数的合法值为'strict'(提出UnicodeDecodeError例外),'replace'(使用U+FFFDREPLACEMENT CHARACTER)或'ignore'(仅将字符留在Unicode结果之外。)