在python中使用lxml和xpath获取空列表

时间:2019-09-02 14:43:12

标签: python html xpath web-scraping lxml

所以我有这段代码,该代码应该可以获取亚马逊上任何物品的价格。但是,我没有得到价格,而是得到了一个空清单。

from lxml import html
import requests

page = requests.get('https://www.amazon.com/gp/product/B06XP634L1?pf_rd_p=183f5289-9dc0-416f-942e-e8f213ef368b&pf_rd_r=W4XQCYJ4N9VQGF8HDAH0')
doc = html.fromstring(page.content)
price = doc.xpath("//span[@id='priceblock_ourprice']")
print(price)

这以前对我有用。 我将不胜感激任何帮助。预先感谢。

1 个答案:

答案 0 :(得分:0)

您需要添加一个User-Agent标头

from lxml import html
import requests

headers = {'User-Agent':'Mozilla\5.0'}
page = requests.get('https://www.amazon.com/gp/product/B06XP634L1?pf_rd_p=183f5289-9dc0-416f-942e-e8f213ef368b&pf_rd_r=W4XQCYJ4N9VQGF8HDAH0', headers = headers)
doc = html.fromstring(page.content)
price = doc.xpath("//span[@id='priceblock_ourprice']")
print(price[0].text)

price = doc.xpath("//span[@id='priceblock_ourprice']/text()")
print(price)

bs4

from bs4 import BeautifulSoup as bs
import requests

headers = {'User-Agent':'Mozilla\5.0'}
page = requests.get('https://www.amazon.com/gp/product/B06XP634L1?pf_rd_p=183f5289-9dc0-416f-942e-e8f213ef368b&pf_rd_r=W4XQCYJ4N9VQGF8HDAH0', headers = headers)
soup = bs(page.content, 'lxml')
price = soup.select_one("#attach-base-product-price")['value']
print(price)