Question

使用Python库Scrapy，我执行以下操作：

scrapy shell "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/"

从那里我想获得每个返回项目的单独链接+文本：

response.xpath('//div[@class="title-and-desc"]/a')

但是，只返回链接而不是文本。以下是返回内容的示例：

response.xpath('//div[@class="title-and-desc"]/a')
[<Selector xpath='//div[@class="title-and-desc"]/a' data=u'<a target="_blank" href="http://www.brpr'>, <Selector xpath='//div[@class="title-and-desc"]/a' data=u'<a target="_blank" href="http://www.dive'>, <Selector xpath='//div[@class="title-and-desc"]/a' data=u'<a target="_blank" href="http://rhodesmi'>,

我可以遍历上面的结果，其中i是每次迭代的变量：

i.xpath("text()").extract_first(),
i.xpath("@href").extract_first()

但只返回@href个值。这样做是因为text()在结果中没有任何内容可以检索。什么需要改变，所以我也可以得到随附的链接文本？

作为参考，完整的Scrapy示例来自此处：Scrapy Tutorial Example。

Answer 1

这是因为您要查找的文本位于子节点div中：

<div class="title-and-desc">
  <a target="_blank" href="http://www.network-theory.co.uk/python/intro/">
    <div class="site-title">An Introduction to Python </div>
  </a>
<div>

您可以通过在//前加//text()代替text()来获取节点的所有文本（包含其子文本），或者只使用显式xpath {{ 1}}。

尝试：

a/div/text()

如何从此XPath获取链接文本？

1 个答案: