UnicodeEncodeError:“ ascii”编解码器在读取URL时无法对字符“ \ xe9”进行编码

时间:2018-08-08 16:57:00

标签: python-3.x

我正在使用Python进行乞讨,并尝试获取Google搜索的结果数量...

因此,我找到了一个不错的代码,使用了以下模块: re BeautifulSoup urllib.request

此代码仅适用于普通字符,但是当我使用特殊字符(例如'é','à'等)时,它将失败。

我不知道该在哪里编码此网址,请有人帮我吗?

这是Python 3的代码:

    from bs4 import BeautifulSoup
    from urllib.request import Request, urlopen
    import re
    def get_result(search):
        search = "https://www.google.com/search?q={}".format(search.replace(" ", "%20"))
        req_google = Request(search)
        req_google.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB;    rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3')
        html_google = urlopen(req_google).read()
        soup = BeautifulSoup(html_google, "html.parser")
        scounttext = str(soup.find('div', id='resultStats'))
        scounttext = scounttext[41:60].replace(u'\xa0', "")
        num = re.findall('\d+', scounttext)
        return int(num[0])

    print(get_result("é"))

它返回此错误:

    UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 14: ordinal not in range(128)

计算此行时出现此错误:

    html_google = urlopen(req_google).read()

0 个答案:

没有答案