Question

我正在尝试使用Xpath提取锚标记的链接

URL

<a class="text size-1x-small font-accent color-brand all-caps"
   href="http://time.com/section/business" 
   data-reactid="199">
       Business
</a>

代码

item["category"] = str(
    response.xpath(
        '//a[@class="text size-1x-small font-accent color-brand all-caps"]/text()'
    ).extract()
    )

还有python函数

def parseSave(self, response):
    item = NYtimesItem()
    item["category"] = response.xpath(
        '//a[@class="text size-1x-small font-accent color-brand all-caps"]/text()'
    ).extract()

    yield item

请告诉我我在做什么错预期的输出将是“锚标记”的文本。例如企业

Answer 1

/text()用于获取元素的内部文本。要提取href属性，请改用/@href。

Here is a handy xpath cheatsheet

Xpath，用于使用类在锚标记中查找文本（Scrapy）

1 个答案: