Question

我正在使用lxml从网页获取字符串。如何获取我提取的数据字符串而不会出现以下错误，我该怎么做？我想我无法使用str()来解决问题。

在python中：

mystring = MySQLdb.escape_string(i.text_content())


(<type 'exceptions.UnicodeEncodeError'>, UnicodeEncodeError('ascii', u"\n\nEve Pownall\n\n  \n    \n    \n    \n        Eve Pownall\n\t  (Author)\n\t\n        \u203a Visit Amazon's Eve Pownall Page\n        Find all the books, read about the author, and more.\n\n         See search results for this author  \n        Are you an author?\n        Learn about Author Central\n        \n      \n   \n  \n\n  \n      amznJQ.onReady('bylinePopover', function () {});\n  \n\n\n (Author)\n\n\n\n\n\n\n\n\n\n\n", 75, 76, 'ordinal not in range(128)'), <traceback object at 0x7f225c99f050>)

Answer 1

您需要以一种众所周知的编码显式编码字符串（最有可能是UTF-8）。

更多信息：

http://collective-docs.readthedocs.org/en/latest/troubleshooting/unicode.html

Python，lxml和<type'exception.unicodeencodeerror'=“”> </type>

1 个答案: