Question

我正在使用Scrapy和XPath。在一个场景中，我需要获取锚元素的href和文本。

我做的是：

使用选择器
通过锚点循环查找href和文本。我能够得到href而不是文本。

以下是了解更好的代码段

anchors = response.selector.xpath("//table[@class='style1']//ul//li//a")
for anchor in anchors:
    link = anchor.xpath('@href').extract()[0]
    name = anchor.xpath('[how-to-access-current-node-here]').text()

我怎样才能实现这个目标？

提前致谢！

Answer 1

您可以使用xpath text（），前提是您知道标题文字的位置（来自 a ），如果标题文字位于其中，请从示例中说一个的父元素，然后将其解压缩只返回一个级别，如下所示：

anchors = response.selector.xpath("//table[@class='style1']//ul//li//a")
for anchor in anchors:
    link = anchor.xpath('@href').extract()[0]
    # go one level back and access text()
    name = anchor.xpath('../text()').extract()

或者，更好的是你甚至需要在for循环中执行此操作，只需使用提取，它将返回一个列表：

anchors = response.selector.xpath("//table[@class='style1']//ul//li//a")

links = anchors.xpath('@href').extract()
names = anchors.xpath('../text()').extract()

paired_links_with_names = zip(links, names)
...
# you may do your thing here or still do a for / loop

当然，您需要检查元素并找出标题文本的位置，这只是您从现有xpath位置访问该文本的方式。

希望这有帮助。

XPath - 如何从循环中的当前节点访问锚文本和href

1 个答案: