我希望以下代码能够返回文本" In Stock"或"缺货" (检查在线商店的库存)但它只返回" []"。 XPath代码是从浏览器的元素检查器获得的,似乎是有效的。我在网上读到了可能存在问题的命名空间。提示?
from lxml import html
import requests
url = 'http://www.thesource.ca/en-ca/computers-and-tablets/computer-accessories/mice/logitech-m310-wireless-mouse/p/2618659'
path = '//*[@id="content"]/section/section/div/font/div[7]/div/div[1]/div[2]/ul/li[1]/div/text()'
page = requests.get(url)
tree = html.fromstring(page.content)
stock = tree.xpath(path)
print(stock)
编辑:解决方案基于Padraic Cunningham的帖子。
由于依赖于某些绝对路径,但仍然不是最优雅的,但至少这是有效的:
from lxml import html
import requests
import re
# in stock example URL
#url = 'http://www.thesource.ca/en-ca/computers-and-tablets/computer-accessories/mice/logitech-m310-wireless-mouse/p/2618659'
# out of stock example URL
url = 'http://www.thesource.ca/en-ca/computers-and-tablets/computer-accessories/mice/microsoft-basic-optical-mouse/p/108029878'
path = '//ul[@class="availability"]/li[./div[1]]'
inner_path = './div[1]/text()'
page = requests.get(url)
tree = html.fromstring(page.content)
stock = tree.xpath(path)
current = stock[0].xpath(inner_path)
print(current[0])
if re.search(r'in.*stock.*online', current[0], flags=re.IGNORECASE):
print "Success!"
else:
print "Keep waiting..."
答案 0 :(得分:1)
你的xpath错了:
from lxml import html
import requests
url = 'http://www.thesource.ca/en-ca/computers-and-tablets/computer-accessories/mice/logitech-m310-wireless-mouse/p/2618659'
path = '//ul[@class="availability"]/li[./div[@class="availability-text in-stock"]]'
page = requests.get(url)
tree = html.fromstring(page.content)
stock = tree.xpath(path)
current = stock[0].xpath('./div[@class="availability-text in-stock"]/text()')
print(current[0])
for node in stock[1:]:
print(node.xpath('./div[@class="availability-text in-stock"]/a/@aria-label'))
这给了你:
In Stock Online
In Stock YORKDALE MALL
In Stock LAWRENCE SQUARE
可用性位于带有availability
类的无序列表中,我们的路径 xpath将所有具有availability-text in-stock
类div的li子项拉入所有divs bar第一个就是有一个锚点:
<a class="underline"
aria-label="In Stock YORKDALE MALL"
title="View Store Details"
href="#product-store-availability">
YORKDALE MALL</a>
您可以看到aria标签包含可用性和商店。
如果您想要分享可用性和商店,可以拆分&amp; nbsp:
print(node.xpath('./div[@class="availability-text in-stock"]/a/@aria-label')[0].split("\xa0"))
哪会给:
['In Stock ', ' YORKDALE MALL']
['In Stock ', ' LAWRENCE SQUARE']
你的浏览器工具在抓取时是必不可少的,只要不要依赖于他们给你的xpath / select,当你右键单击并选择copy xpath / selector时,看看源代码并尝试查找与您尝试解析的内容相关联的ID或类名。
如果您只想要第一个,您仍然可以使用xpath:
url = 'http://www.thesource.ca/en-ca/computers-and-tablets/computer-accessories/mice/logitech-m310-wireless-mouse/p/2618659'
path = '(//ul[@class="availability"]/li/div[@class="availability-text in-stock"])[1]/text()'
page = requests.get(url)
tree = html.fromstring(page.content)
stock = tree.xpath(path)
success = {"in","stock"}
if stock and all(w in success for w in stock[0].lower().split()):
print("Success")
else:
print("Failure")