Question

hoteldata = response.selector.xpath("//*[@id='js_itemlist']")
    for hoteldata in hoteldata:
        title = hoteldata.xpath("//*[@id='([jsheadline_]+\d{5}[0-9])']/span/text()").extract()
        partner_name = hoteldata.xpath("//*[@id='([js_item_]+\d{5}[0-9])']/div[1]/div[2]/div[3]/strong[1]/text()").extract()
        price_single = hoteldata.xpath("//*[@id='([js_item_]+\d{5}[0-9])']/div[1]/div[2]/div[3]/strong[2]/text()").extract()
        print title, partner_name, price_single

没有错误发生，也没有输出

Answer 1

//*[@id='([jsheadline_]+\d{5}[0-9])']/span/text()

这些以及帖子中的其他内容都是有效的XPath，但不使用正则表达式，它们只是针对一个看起来像正则表达式的字符串进行测试。如果你想知道Scrapy是否支持它，try to use the matches() function，它是XPath 2.0的一部分。我不知道Scrapy在下面使用了什么引擎，但是如果失败则会收到错误。

此外，你的正则表达式看起来有点奇怪。您使用[jsheadline_]+，它是一个或多个重复的字符类，但看起来好像您要测试字符串"jsheadline_"。如果是这种情况，您可以使用contains()函数以及substring-before()和substring-after()函数来测试该字符串是否存在，后跟一些数字。这些函数适用于任何版本的XPath，只需google它们，您就会找到大量示例。

scrapy是否支持xpath中的正则表达式？

1 个答案: