Question

我正在使用此XPath来获取mailto：

之后的text（）

//a[starts-with(@href, 'mailto')]/text()

现在我希望能够提取 mailto:之后的内容：

<a href="mailto:info@info.com?subject=hello">here</a>

我想得到：info@info.com?subject=hello

我应该使用什么XPath来获取mailto之后的字符串：？

编辑：似乎是使用javascript生成mailto：。可以scrapy处理这样的事情吗？

<script type="text/javascript"> \n </script>

解决方案：我认为我应该将Selenium用于javascript。

Answer 1

for $a in //a[starts-with(@href, 'mailto')]
    return substring-after(normalize-space($a/@href),'mailto:')

<强>更新

//a[starts-with(@href, 'mailto')]/substring-after(normalize-space(./@href),'mailto:')

XPath表达式无法提取mailto：属性

1 个答案: