Question

我正在寻找合适的xpath表达式来搜索包含字符串的html页面中的所有text（）：@ domain

在比赛中提取到左边的第一个空格，直到右边的第一个空格 -

只是为了获取电子邮件地址。

由于

Answer 1

此Xpath查询将获取包含“@domain”

的所有节点的文本

//*[contains(text(), '@domain')]/text()

然后，您可以解析文本以使用Python

提取电子邮件

>>> import re
>>> re.findall(r'[\w\.]+@domain\.[\w\.]+', 'this is our info: info@domain.co.uk')
['info@domain.co.uk']

更新：

在scrapy have re method中看起来像XPath选择器，我不知道：

>>> hxs.select('//a[contains(@href, "image")]/text()').re(r'Name:\s*(.*)')
[u'My image 1',
 u'My image 2',
 u'My image 3',
 u'My image 4',
 u'My image 5']

Python Xpath查找包含@domain的text（）

1 个答案: