我正在尝试使用BeautifulSoup来获取亚马逊畅销书的列表。 这是我试图使用的代码:
from urllib2 import urlopen
from bs4 import BeautifulSoup
from HTMLParser import HTMLParser
def main():
html_parser = HTMLParser()
soup = BeautifulSoup(urlopen("http://www.amazon.com/gp/bestsellers/").read())
categories = []
# Scrape list of category names and urls
for category_li in soup.find(attrs={'id':'zg_browseRoot'}).find('ul').findAll('li'):
category = {}
category['name'] = html_parser.unescape(category_li.a.string)
category['url'] = category_li.a['href']
categories.append(category)
del soup
# Loop through categories and print out each product's name, rank, and url.
for category in categories:
print category['name']
print '-'*50
soup = BeautifulSoup(urlopen(category['url']))
i = 1
for title_div in soup.findAll(attrs={'class':'zg_title'}):
if i ==1:
print "%d. %s\n %s" % (i, html_parser.unescape(title_div.a.string), title_div.a['href'].strip())
i += 1
print ''
if __name__ == '__main__':
main()
当我运行代码时,我收到此错误:
for category_li in soup.find(attrs={'id':'zg_browseRoot'}).find('ul').findAll('li'):
AttributeError: 'NoneType' object has no attribute 'find'
为什么我收到此错误以及如何解决? 任何帮助表示赞赏。
答案 0 :(得分:1)
尝试阅读第二汤中的内容:
for category in categories:
print category['name']
print '-'*50
soup = BeautifulSoup(urlopen(category['url']).read())
...
它给了我一些非常好的输出。