Question

我的文字是html的一部分。我想将它保存到文件中。

这在Eclipse中的调试模式下工作正常，但在运行时从shell失败。我正在使用一个简短的html示例。

xx = '<input type="hidden" name="charset_test" value="€,´,€,´,水,Д,Є" />'
with codecs.open('myfile.htm'), 'wb', encoding="utf-8") as output:
    output.write(data)

我得到了：

 Exception 'ascii' codec can't decode byte 0xe2 in position XXX: ordinal not in range(128)

其中XXX是“奇怪”符号的相关文件中的位置，例如EURO符号。

为什么这可以从Eclipse而不是shell？我该如何解决这个问题？

我试过

HTMLParser.HTMLParser().unescape()
unquote()
unicode()

没有任何效果......

Answer 1

以下代码适用于我......

# coding=utf-8

import codecs

data = '<input type="hidden" name="charset_test" value="€,´,€,´,水,Д,Є" />'
with codecs.open('myfile.htm', 'wb', encoding="utf-8") as output:
    output.write(data.decode('utf-8'))

...但是如果源数据已经是UTF-8编码，并且您还想编写UTF-8数据，则无需将其解码为Python unicode对象，然后重新编码到UTF-8。你可以做......

# coding=utf-8

data = '<input type="hidden" name="charset_test" value="€,´,€,´,水,Д,Є" />'
with open('myfile.htm', 'wb') as output:
    output.write(data)

在python中保存到文件时，Ascii编解码器无法解码HTML

1 个答案: