Question

我正在使用此代码从外部程序获取标准输出：

>>> from subprocess import *
>>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]

communic（）方法返回一个字节数组：

>>> command_stdout
b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n'

但是，我想将输出作为普通的Python字符串使用。所以我可以像这样打印出来：

>>> print(command_stdout)
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2

我认为这就是binascii.b2a_qp()方法的用途，但是当我尝试它时，我又得到了相同的字节数组：

>>> binascii.b2a_qp(command_stdout)
b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n'

有人知道如何将字节值转换回字符串吗？我的意思是，使用“电池”而不是手动操作。而且我希望它能用于Python 3。

Answer 1

您需要解码bytes对象以生成字符串：

>>> b"abcde"
b'abcde'

# utf-8 is used here because it is a very common encoding, but you
# need to use the encoding your data is actually in.
>>> b"abcde".decode("utf-8") 
'abcde'

Answer 2

我认为这很容易：

bytes = [112, 52, 52]
"".join(map(chr, bytes))
>> p44

Answer 3

您需要解码字节字符串并将其转换为字符（unicode）字符串。

b'hello'.decode(encoding)

或在Python 3上

str(b'hello', encoding)

Answer 4

如果您不知道编码，那么要以Python 3和Python 2兼容的方式将二进制输入读入字符串，请使用古老的MS-DOS cp437编码：

PY3K = sys.version_info >= (3, 0)

lines = []
for line in stream:
    if not PY3K:
        lines.append(line)
    else:
        lines.append(line.decode('cp437'))

由于编码未知，因此期望非英语符号转换为cp437的字符（英语字符未翻译，因为它们在大多数单字节编码和UTF-8中匹配）。

将任意二进制输入解码为UTF-8是不安全的，因为你可能会得到这个：

>>> b'\x00\x01\xffsd'.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 2: invalid
start byte

这同样适用于{2}，这对于Python 2很流行（默认？）。请参阅Codepage Layout中的缺失点 - 这是Python与臭名昭着的latin-1窒息的地方。

更新20150604 ：有传言称Python 3有ordinal not in range错误策略，可将内容编码为二进制数据而不会丢失数据并导致崩溃，但需要转换测试surrogateescape验证性能和可靠性。

UPDATE 20170116 ：感谢Nearoo的评论 - 还有可能使用[binary] -> [str] -> [binary]错误处理程序来删除所有未知字节。这仅适用于Python 3，因此即使使用此解决方法，您仍将从不同的Python版本获得不一致的输出：

backslashreplace

有关详细信息，请参阅https://docs.python.org/3/howto/unicode.html#python-s-unicode-support。

UPDATE 20170119 ：我决定实现适用于Python 2和Python 3的斜线转义解码。PY3K = sys.version_info >= (3, 0) lines = [] for line in stream: if not PY3K: lines.append(line) else: lines.append(line.decode('utf-8', 'backslashreplace'))解决方案应该慢一点，但它应该生成相同的每个Python版本的结果。

cp437

Answer 5

In Python 3，默认编码为"utf-8"，因此您可以直接使用：

b'hello'.decode()

相当于

b'hello'.decode(encoding="utf-8")

另一方面，in Python 2，编码默认为默认字符串编码。因此，您应该使用：

b'hello'.decode(encoding)

其中encoding是您想要的编码。

在Python 2.7中添加了

Note:对关键字参数的支持。

Answer 6

我认为你真正想要的是：

>>> from subprocess import *
>>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]
>>> command_text = command_stdout.decode(encoding='windows-1252')

Aaron的回答是正确的，除了你需要知道要使用的WHICH编码。我相信Windows使用'windows-1252'。只有在你的内容中有一些不寻常的（非ascii）字符才有意义，但它会产生影响。

顺便说一句，它很重要的事实是Python转向使用两种不同类型的二进制和文本数据的原因：它不能在它们之间神奇地转换，因为它不知道编码，除非你告诉它！您知道的唯一方法是阅读Windows文档（或在此处阅读）。

Answer 7

将universal_newlines设置为True，即

command_stdout = Popen(['ls', '-l'], stdout=PIPE, universal_newlines=True).communicate()[0]

Answer 8

虽然@Aaron Maenpaa's answer正常，但用户recently asked

还有更简单的方法吗？ 'fhand.read（）。decode（“ASCII”）'[...]它太长了！

您可以使用

command_stdout.decode()

decode()有standard argument

codecs.decode(obj, encoding='utf-8', errors='strict')

Answer 9

要将字节序列解释为文本，您必须知道相应的字符编码：

$http({
    method  : 'POST',
    url     : 'api/login ',
    data    : $.param($scope.formData),  
    headers : { 'Content-Type': 'application/x-www-form-urlencoded', 'X-API-KEY': 'eroo9rwabcor-ltjabcerabc9r' }  
})
.success(function (response) {
    $scope.data=response;
    $location.path('#/tab/abc');
})
.error(function (data, status, header, config) {
    $window.alert("username or password incorrect");
});

示例：

unicode_text = bytestring.decode(character_encoding)

>>> b'\xc2\xb5'.decode('utf-8') 'µ'命令可能会生成无法解释为文本的输出。文件名在Unix上可以是除斜杠ls和零之外的任何字节序列 b'/'：

b'\0'

尝试使用utf-8编码解码此类字节汤会引发>>> open(bytes(range(0x100)).translate(None, b'\0/'), 'w').close()。

可能会更糟。解码可能会无声地失败并生成mojibake 如果使用错误的不兼容编码：

UnicodeDecodeError

数据已损坏但您的程序仍未发现故障已经发生了。

通常，要使用的字符编码不嵌入字节序列本身。您必须在带外传达此信息。某些结果比其他结果更有可能，因此>>> '—'.encode('utf-8').decode('cp1252') 'â€”'模块可以猜测字符编码。单个Python脚本可能在不同的地方使用多个字符编码。

可以使用chardet将

ls输出转换为Python字符串甚至对undecodable filenames成功的功能（它使用 os.fsdecode()和sys.getfilesystemencoding()错误处理程序 UNIX）：

surrogateescape

要获取原始字节，您可以使用import os import subprocess output = os.fsdecode(subprocess.check_output('ls'))。

如果您传递os.fsencode()参数，则universal_newlines=True会使用 subprocess解码字节，例如，它可以 Windows上的locale.getpreferredencoding(False)。

即时解码字节流， io.TextIOWrapper() 可以使用：example。

不同的命令可能会使用不同的字符编码输出，例如，cp1252内部命令（dir）可以使用cp437。解码它输出，你可以显式传递编码（Python 3.6 +）：

cmd

文件名可能与output = subprocess.check_output('dir', shell=True, encoding='cp437')（使用Windows）不同 Unicode API）例如，os.listdir()可以用'\xb6' - Python代替 cp437编解码器映射'\x14'来控制字符U + 0014而不是 U + 00B6（¶）。要支持具有任意Unicode字符的文件名，请参阅Decode poweshell output possibly containing non-ascii unicode characters into a python string

Answer 10

由于此问题实际上是询问subprocess输出，因此Popen接受encoding关键字（在Python 3.6 +中），您可以使用更直接的方法：

>>> from subprocess import Popen, PIPE
>>> text = Popen(['ls', '-l'], stdout=PIPE, encoding='utf-8').communicate()[0]
>>> type(text)
str
>>> print(text)
total 0
-rw-r--r-- 1 wim badger 0 May 31 12:45 some_file.txt

其他用户的一般答案是解码字节到文本：

>>> b'abcde'.decode()
'abcde'

如果没有参数，将使用sys.getdefaultencoding()。如果您的数据不是sys.getdefaultencoding()，那么您必须在decode调用中明确指定编码：

>>> b'caf\xe9'.decode('cp1250')
'café'

Answer 11

如果您尝试decode()：

，请尝试以下方法

AttributeError: 'str' object has no attribute 'decode'

您还可以直接在演员表中指定编码类型：

>>> my_byte_str
b'Hello World'

>>> str(my_byte_str, 'utf-8')
'Hello World'

Answer 12

我做了一个清理列表的功能

\s

Answer 13

使用来自Windows系统的数据（\r\n行结尾）时，我的回答是

String = Bytes.decode("utf-8").replace("\r\n", "\n")

为什么呢？尝试使用多行Input.txt：

Bytes = open("Input.txt", "rb").read()
String = Bytes.decode("utf-8")
open("Output.txt", "w").write(String)

所有行结尾都会加倍（到\r\r\n），导致额外的空行。 Python的文本读取函数通常将行结尾标准化，以便字符串仅使用\n。如果从Windows系统接收二进制数据，Python就没有机会这样做。因此，

Bytes = open("Input.txt", "rb").read()
String = Bytes.decode("utf-8").replace("\r\n", "\n")
open("Output.txt", "w").write(String)

将复制原始文件。

Answer 14

对于Python 3，这是一种更安全的 Pythonic 方法，可以从byte转换为string：

def byte_to_str(bytes_or_str):
    if isinstance(bytes_or_str, bytes): #check if its in bytes
        print(bytes_or_str.decode('utf-8'))
    else:
        print("Object not of byte type")

byte_to_str(b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n')

输出：

total 0
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2

Answer 15

尝试

bytes.fromhex('c3a9').decode('utf-8')

Answer 16

def toString(string):    
    try:
        return v.decode("utf-8")
    except ValueError:
        return string

b = b'97.080.500'
s = '97.080.500'
print(toString(b))
print(toString(s))

Answer 17

如果要转换任何字节，而不仅仅是将字符串转换为字节：

 public boolean isEquilateral()

{   

 for(int i = 0; i < vertices.length; i++){

     if (Point.distance(vertices[i],vertices[i+1]) != 

 Point.distance(vertices[0],vertices[vertices.length]))
          return false;

    }

    return true; 
} 
}

但是，这不是很有效。它将2 mb图片变成9 mb。

Answer 18

对于“运行shell命令并将其输出作为文本而不是字节获取”的特定于的情况，在Python 3.7上，应使用subprocess.run并传入{{1} }（以及text=True来捕获输出）

capture_output=True

command_result = subprocess.run(["ls", "-l"], capture_output=True, text=True) command_result.stdout # is a `str` containing your program's stdout过去被称为text，并在Python 3.7中进行了更改（很好，别名）。如果要支持3.7之前的Python版本，请传入universal_newlines而不是universal_newlines=True

Answer 19

尝试使用这个；此函数将忽略所有非字符集（如 utf-8）二进制文件并返回一个干净的字符串。它针对 python3.6 及更高版本进行了测试。

def bin2str(text, encoding = 'utf-8'):
    """Converts a binary to Unicode string by removing all non Unicode char
    text: binary string to work on
    encoding: output encoding *utf-8"""

    return text.decode(encoding, 'ignore')

在这里，该函数将获取二进制文件并对其进行解码（使用 python 预定义字符集将二进制数据转换为字符，ignore 参数会忽略二进制文件中的所有非字符集数据，并最终返回所需的 {{ 1}} 值。

如果您不确定编码，请使用 string 获取设备的默认编码。

Answer 20

使用 .decode() 解码。这将解码字符串。传入 'utf-8') 作为内部的值。

Answer 21

来自http://docs.python.org/3/library/sys.html，

要从/向标准流写入或读取二进制数据，请使用基础二进制缓冲区。例如，要将字节写入stdout，请使用sys.stdout.buffer.write（b'abc'）。

将字节转换为字符串？

21 个答案: