有问题的网址是: http://www.amazon.com/s/field-keywords=machine%20learning
我想在第一页中提取所有匹配结果的链接,我相信相应的html是:
<h3 class="newaps">
<a href="https://rads.stackoverflow.com/amzn/click/com/1600490069" rel="nofollow noreferrer">
<span class="lrg bold"> … </span>
</a>
<span class="med reg"> … </span>
</h3>
这是我迄今为止没有运气的尝试:
In [39]: url = "http://www.amazon.com/s/field-keywords=machine%20learning"
In [40]: page = urllib2.urlopen(url)
In [42]: html = page.read()
In [44]: soup = BeautifulSoup(html)
In [45]: soup.findAll('h3',{'class': 'newaps'})
Out[45]: []
我不确定我在这里做错了什么?