Question

我正在尝试阅读html文件，但在搜索标题和网址以与我的关键字'alist'进行比较时，我收到此错误Unicode Encode Error: 'ascii' codec can't encode character u'\u2019'.链接错误（http://tinypic.com/r/307w8bl/8 ）

代码

for q in soup.find_all('a'):
    title = (q.get('title'))
    url = ((q.get('href')))
    length = len(alist)
    i = 0
    while length > 0:
        if alist[i] in str(title): #checks for keywords from html form from the titles and urls
            r.write(title)
            r.write("\n")
            r.write(url)
            r.write("\n")
        i = i + 1
        length = length -1
doc.close()
r.close()

一点背景。 alist包含一个关键字列表，我将用它来与标题进行比较，以便得到我想要的。奇怪的是，如果alist包含2个或更多单词，它将完美运行但如果只有一个单词，则会出现如上所示的错误。提前致谢。

Answer 1

如果您的列表必须是字符串列表，请尝试编码title var

>>> alist=['á'] #asci string
>>> title = u'á' #unicode string
>>> alist[0] in title
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
>>> title and alist[0] in title.encode('utf-8')
True
>>>

Answer 2

据推测，title是一个Unicode字符串，可以包含任何类型的字符; str(title)尝试使用ASCII编解码器将其转换为字节字符串，但由于标题包含非ASCII字符而失败。

你想做什么？为什么需要将标题转换为字节串？

Answer 3

问题出在str(title)。你正试图将unicode数据转换为字符串。

为什么你要将title转换成字符串？您可以直接访问它。

soup.find_all会返回您的字符串列表。

Unicode编码错误：'ascii'编解码器无法编码字符u'\ u2019'

3 个答案: