我用Python将文件读入字符串,它显示为已编码(不确定编码)。
query = ""
with open(file_path) as f:
for line in f.readlines():
print(line)
query += line
query
所有行均按预期以英文打印
select * from table
但最后的查询显示为
ÿþd\x00r\x00o\x00p\x00 \x00t\x00a\x00b\x00l\x00e\x00
这是怎么回事?
答案 0 :(得分:2)
似乎是UTF-16数据。 您可以尝试使用utf-16对其进行解码吗?
with open(file_path) as f:
query=f.decode('utf-16')
print(query)
答案 1 :(得分:2)
同意卡洛斯(Carlos)的编码似乎是UTF-16LE。 BOM似乎存在,因此encoding="utf-16"
可以自动检测到是低端还是大端。
惯用的Python是:
with open(file_path, encoding="...") as f:
for line in f:
# do something with this line
在您的情况下,您将每行追加到查询中,因此整个代码可以简化为:
query = open(file_path, encoding="...").read()
答案 2 :(得分:0)
with open(filePath) as f:
fileContents = f.read()
if isinstance(fileContents, str):
fileContents = fileContents.decode('ascii', 'ignore').encode('ascii') #note: this removes the character and encodes back to string.
elif isinstance(fileContents, unicode):
fileContents = fileContents.encode('ascii', 'ignore')