我正在从discogs.com上抓艺术家。我无法获得页面上显示的艺术家姓名。例如。当我运行我的代码时,艺术家Andrés出现在Andr \ xe9s。
有谁能解释我做错了什么?
from bs4 import BeautifulSoup
import requests
import urllib2
from itertools import chain
import codecs
headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.0; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0' }
all_artists = []
result_pages = 1 #446
def load_artists():
for page in xrange(1, result_pages+1):
url = url = 'https://www.discogs.com/search/?sort=have%2Cdesc&style_exact=House&genre_exact=Electronic&decade=2010&page=' + str(page)
r = requests.get(url, headers = headers)
soup = BeautifulSoup(r.content.decode('utf-8'), 'html.parser')
[all_artists.append(tag["title"]) for tag in soup.select('div#search_results h5 span')]
load_artists()
all_artists
答案 0 :(得分:0)
没有错误,它们作为unicode输出,当你要求Python打印它们时它们会正确打印:
for a in all_artists:
print(a)
...
Andrés
...
答案 1 :(得分:0)
你需要使用python3,你将不再受这个
的影响