Question

我从一个网页上找到了一个词“Zápas”。问题是我的数据中最接近的版本看起来像：Windows-1250编码后的Z \ xe1pas。如果没有str（a），就会出现一些ANSI错误。谢谢您的帮助。

我的尝试：

def encode(text):
    return text.encode('windows-1250')...
...
for cell in row.findAll(['td', 'th']):
    cell=encode(cell.get_text().strip())
    a.append(cell)
foo.write(str(a)+"\n")

Answer 1

可能a字符串实际上是一个unicode字符串。要将其写入字节流，您应该将其编码为指定编码的字节字符串，在您的情况下为windows-1250，在我的情况下为utf-8。只需使用a.encode(<encoding>) before sending result to the foo.write（）`：

foo.write(a.encode("utf-8"))

此外，我会在unicode之前处理整个数据，然后在“外部”（到文件，到远程目标等）发送数据之前立即转换为字节字符串

编码斯洛伐克字母如：ášČé

1 个答案: