Question

from lxml import html
import requests
import time


#Gets prices
page = requests.get('https://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=hi')
tree = html.fromstring(page.content)
price = tree.xpath('//h2[@data-attribute="Hi Guess the Food - What’s the Food Brand in the Picture"]/text()')

print(price)

这只返回[]

在查看page.content时，它会显示亚马逊反机器人的东西。我怎么能绕过这个？

Answer 1

当您试图从某个网站上删除某些内容时，请提出一个一般建议。首先查看返回的内容，在这种情况下page.content，然后再尝试任何内容。你错误地假设亚马逊允许你很好地获取他们的数据，当他们没有。

Answer 2

我认为urllib2更好，xpath可能是：

price = c.xpath('//div[@class="s-item-container"]//h2')[0]
print price.text

毕竟，长字符串可能包含奇怪的字符。

如何使代码使用xpath返回文本？

2 个答案: