Question

HTML：

<html>
 <head>
  <base href='http://example.com/' />
  <title>Example website</title>
 </head>
 <body>
  <div id='demo'>
 <div>Fruite:Apple&nbsp;&nbsp;&nbsp;&nbsp;Sport:Football&nbsp;&nbsp;&nbsp;&nbsp;Language:English</div>
  </div>
 </body>
</html>

我想得到这3个结果，如下：

>>> response.xpath('//div[@id="demo"]/div/text()').re(r'')
u'Apple'

>>> response.xpath('//div[@id="demo"]/div/text()').re(r'')
u'Football'

>>> response.xpath('//div[@id="demo"]/div/text()').re(r'')
u'English'

如何在上面的re()中编写正则表达式？

Answer 1

response.xpath('//div[@id="demo"]/div/text()').re(':(\w+)')

Answer 2

\w+应该这样做。其中\w是单词字符，+是贪心量词。

In: response.xpath('//div[@id="demo"]/div/text()').re(r'\w+')  
Out: ['Fruite', 'Apple', 'Sport', 'Football', 'Language', 'English']

您可以预先添加:以获取冒号后的字词：:(\w+)

In: response.xpath('//div[@id="demo"]/div/text()').re(r':(\w+)')
Out: ['Apple', 'Football', 'English']

如果您只想要特定的字词，那么您只需使用|（或分隔符）列出您的字词：

In: response.xpath('//div[@id="demo"]/div/text()').re(r'Apple|Football|English')
Out: ['Apple', 'Football', 'English']

如何写这个Scrapy的正则表达式？

2 个答案: