Question

我希望以下代码能够返回文本＆＃34; In Stock＆＃34;或＆＃34;缺货＆＃34; （检查在线商店的库存）但它只返回＆＃34; []＆＃34;。 XPath代码是从浏览器的元素检查器获得的，似乎是有效的。我在网上读到了可能存在问题的命名空间。提示？

from lxml import html
import requests

url = 'http://www.thesource.ca/en-ca/computers-and-tablets/computer-accessories/mice/logitech-m310-wireless-mouse/p/2618659'
path = '//*[@id="content"]/section/section/div/font/div[7]/div/div[1]/div[2]/ul/li[1]/div/text()'

page = requests.get(url)
tree = html.fromstring(page.content)
stock = tree.xpath(path)
print(stock)

编辑：解决方案基于Padraic Cunningham的帖子。

由于依赖于某些绝对路径，但仍然不是最优雅的，但至少这是有效的：

from lxml import html
import requests
import re

# in stock example URL
#url = 'http://www.thesource.ca/en-ca/computers-and-tablets/computer-accessories/mice/logitech-m310-wireless-mouse/p/2618659'

# out of stock example URL
url = 'http://www.thesource.ca/en-ca/computers-and-tablets/computer-accessories/mice/microsoft-basic-optical-mouse/p/108029878'

path = '//ul[@class="availability"]/li[./div[1]]'
inner_path = './div[1]/text()'

page = requests.get(url)
tree = html.fromstring(page.content)
stock = tree.xpath(path)
current = stock[0].xpath(inner_path)

print(current[0])
if re.search(r'in.*stock.*online', current[0], flags=re.IGNORECASE):
    print "Success!"
else:
    print "Keep waiting..."

Answer 1

你的xpath错了：

 from lxml import html
import requests

url = 'http://www.thesource.ca/en-ca/computers-and-tablets/computer-accessories/mice/logitech-m310-wireless-mouse/p/2618659'
path = '//ul[@class="availability"]/li[./div[@class="availability-text in-stock"]]'

page = requests.get(url)
tree = html.fromstring(page.content)

stock = tree.xpath(path)
current = stock[0].xpath('./div[@class="availability-text in-stock"]/text()')
print(current[0])
for node in stock[1:]:
    print(node.xpath('./div[@class="availability-text in-stock"]/a/@aria-label'))

这给了你：

  In Stock Online
In Stock   YORKDALE  MALL
In Stock   LAWRENCE SQUARE

可用性位于带有availability类的无序列表中，我们的路径 xpath将所有具有availability-text in-stock类div的li子项拉入所有divs bar第一个就是有一个锚点：

            <a class="underline"
            aria-label="In Stock &nbsp; YORKDALE  MALL"
            title="View Store Details"
            href="#product-store-availability">
                YORKDALE  MALL</a>

您可以看到aria标签包含可用性和商店。

如果您想要分享可用性和商店，可以拆分＆amp; nbsp：

print(node.xpath('./div[@class="availability-text in-stock"]/a/@aria-label')[0].split("\xa0"))

哪会给：

['In Stock ', ' YORKDALE  MALL']
['In Stock ', ' LAWRENCE SQUARE']

你的浏览器工具在抓取时是必不可少的，只要不要依赖于他们给你的xpath / select，当你右键单击并选择copy xpath / selector时，看看源代码并尝试查找与您尝试解析的内容相关联的ID或类名。

如果您只想要第一个，您仍然可以使用xpath：

url = 'http://www.thesource.ca/en-ca/computers-and-tablets/computer-accessories/mice/logitech-m310-wireless-mouse/p/2618659'
path = '(//ul[@class="availability"]/li/div[@class="availability-text in-stock"])[1]/text()'

page = requests.get(url)
tree = html.fromstring(page.content)
stock = tree.xpath(path)
success = {"in","stock"}

if stock and all(w in success for w in stock[0].lower().split()):
    print("Success")
else:
    print("Failure")

XPath返回空列表（命名空间问题？）

1 个答案: