UnicodeEncodeError:'ascii'编解码器无法在位置32编码字符'\ u2159':序数不在范围内(128)

时间:2019-06-28 06:58:34

标签: python python-3.x url beautifulsoup python-unicode

我正在使用python3和beautifulsoup抓取网站,但出现此错误。我试图使用其他答案中给出的解决方案来解决此问题,但没有一个解决我的问题。

# -*- coding: utf-8 -*-
import os
import locale
os.environ["PYTHONIOENCODING"] = "utf-8"
myLocale=locale.setlocale(category=locale.LC_ALL, locale="en_GB.UTF-8")

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
import pandas as pd


def getrank (animeurl):
    html = urlopen(animeurl)
    bslink = BeautifulSoup(html.read(), 'html.parser')

    rank = bslink.find('span', {'class' : 'numbers ranked'}).get_text().replace('Ranked #', '')



def spring19():
    html = urlopen('https://...')
    bs = BeautifulSoup(html.read(), 'html.parser')

    link = []
    for x in bs.find_all('a', {'class' : 'link-title'}):
        link.append(x.get("href"))



    ranklist = []
    for x in link:
        x.encode(encoding='UTF-8',errors='ignore')
        ranklist.append(getrank(x))

    return ranklist

spring19()


错误消息是: UnicodeEncodeError:'ascii'编解码器无法在位置32编码字符'\ u2159':序数不在range(128)

出现此错误的原因是,我报废的网址中有一些符号。但是我仍然不知道该如何解决。

非常感谢!

1 个答案:

答案 0 :(得分:0)

使用How to convert a url string to safe characters with python?

的解决方案解决了该问题

代码修改如下:

    ranklist = []
    for x in link:
        x = quote(x, safe='/:?=&')
        ranklist.append(getrank(x))