Question

我用Python将文件读入字符串，它显示为已编码（不确定编码）。

query = ""
with open(file_path) as f:
 for line in f.readlines():
   print(line)
   query += line
query

所有行均按预期以英文打印

select * from table

但最后的查询显示为

ÿþd\x00r\x00o\x00p\x00 \x00t\x00a\x00b\x00l\x00e\x00

这是怎么回事？

Answer 1

似乎是UTF-16数据。您可以尝试使用utf-16对其进行解码吗？

with open(file_path) as f:
    query=f.decode('utf-16')
print(query)

Answer 2

同意卡洛斯（Carlos）的编码似乎是UTF-16LE。 BOM似乎存在，因此encoding="utf-16"可以自动检测到是低端还是大端。

惯用的Python是：

with open(file_path, encoding="...") as f:
    for line in f:
        # do something with this line

在您的情况下，您将每行追加到查询中，因此整个代码可以简化为：

query = open(file_path, encoding="...").read()

Answer 3

with open(filePath) as f:
    fileContents =  f.read()
    if isinstance(fileContents, str):
        fileContents = fileContents.decode('ascii', 'ignore').encode('ascii') #note: this removes the character and encodes back to string.
    elif isinstance(fileContents, unicode):
        fileContents = fileContents.encode('ascii', 'ignore')

如何解码从文件读取的字符串？

3 个答案: