Question

我试图解码以下字符串并收到错误。

item = lh.fromstring(items[1].text).text_content().strip().decode('utf-8')

File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)

UnicodeEncodeError: 'ascii' codec can't encode character u'\u20a8' in position 0: ordinal not in range(128)

任何想法都错了吗？

items[1].text = <strong>₨ 18,500 </strong> 
repr(items[1].text) = u'\u20a8 18,500'

Answer 1

您已调用decode但您的错误引用encode的事实是一个线索，您的字符串是Unicode开头，而不是字节字符串。 decode用于从字节串转换为Unicode，encode用于反转。

Answer 2

您似乎正在尝试解码已解码的（Unicode）字符串。所以，删除.decode('utf-8')，它应该工作。除非，你的意思是“解码”的其他东西（也许你想将字符串编码为某种特定的编码）。

如何解码以下字符串

2 个答案: