Question

这是我的代码：

# -*- coding: utf-8 -*-
import subprocess as sp
import locale

LOCAL_ENCODING = locale.getpreferredencoding()

cmds = ['dir', '/b', '*.txt']

out = sp.check_output(cmds, shell=True)
print(out)
print(out.decode(LOCAL_ENCODING))

s = 'レミリア・スカレート.txt'
print(s.encode(LOCAL_ENCODING, 'replace'))
print(LOCAL_ENCODING)
# print(s.encode('utf-8'))

这是输出：

b'\xa5\xec\xa5\xdf\xa5\xea\xa5\xa2?\xa5\xb9\xa5\xab\xa5\xec\xa9`\xa5\xc8.txt\r\n'
レミリア?スカレート.txt

b'\xa5\xec\xa5\xdf\xa5\xea\xa5\xa2?\xa5\xb9\xa5\xab\xa5\xec\xa9`\xa5\xc8.txt'
cp936

（名为'レミリア・スカレート.txt'的文本文件位于脚本目录中。）

结果显示，返回的文件名的字节已经由本地编码自动编码，这不能完全编码文件名（请注意?中的文件名。字节），因此丢失了一些信息。

环境：
   - win10简体中文
   - python-3.5.1

我的问题是：
是否可以避免自动本地编码并获得utf-8（或其他一些指定的编码）字节？我读了这个issue，但没有解决方案： - （

1.对于内置命令，由eryksun的答案解决：

out = sp.check_output('cmd.exe /u /c "dir /b *.txt"').decode('utf-16le')，
  /u：输出UNICODE字符（UCS-2文件），
  /c：运行命令，然后终止）

2.对于外部程序：[无一般解决方案]
使用正确的编码配置输出（通过设置外部程序＆＃39;选项或配置，当然，这些选项可能不存在），
例如，在最新的winrar中，可以设置控制台rar消息的编码：rar lb -scur data > list.txt，将生成带有存档文件名的Unicode list.txt

python3子进程返回本地编码字节使信息丢失

0 个答案: