Question

我想在Scrapy中使用XPath提取此类型的URL（链接文本是具有任意数字位数的数字，href是随机文本）。

我能想到像

这样的东西

HtmlXPathSelector(response).select('//a[matches(text(),"\d+")]/@href')

然而，似乎不支持XPath 2.0，我不能使用正则表达式。

我可以搜索的最佳单行解决方案来自这个问题：xpath expression for regex-like matching? - 在scrapy中有更好的方法来实现这个目标吗？

Answer 1

.select('//a[. != "" and translate(., "0123456789", "") = ""]/@href')