我试图从文本中提取包含指定单词的所有句子。
txt="I like to eat apple. Me too. Let's go buy some apples."
txt = "." + txt
re.findall(r"\."+".+"+"apple"+".+"+"\.", txt)
但它正在归还我:
[".I like to eat apple. Me too. Let's go buy some apples."]
而不是:
[".I like to eat apple., "Let's go buy some apples."]
请帮忙吗?
答案 0 :(得分:17)
不需要正则表达式:
>>> txt = "I like to eat apple. Me too. Let's go buy some apples."
>>> [sentence + '.' for sentence in txt.split('.') if 'apple' in sentence]
['I like to eat apple.', " Let's go buy some apples."]
答案 1 :(得分:12)
In [3]: re.findall(r"([^.]*?apple[^.]*\.)",txt)
Out[4]: ['I like to eat apple.', " Let's go buy some apples."]
答案 2 :(得分:8)
In [7]: import re
In [8]: txt=".I like to eat apple. Me too. Let's go buy some apples."
In [9]: re.findall(r'([^.]*apple[^.]*)', txt)
Out[9]: ['I like to eat apple', " Let's go buy some apples"]
但请注意,@ jamylak基于split
的解决方案更快:
In [10]: %timeit re.findall(r'([^.]*apple[^.]*)', txt)
1000000 loops, best of 3: 1.96 us per loop
In [11]: %timeit [s+ '.' for s in txt.split('.') if 'apple' in s]
1000000 loops, best of 3: 819 ns per loop
对于较大的琴弦,速度差异较小,但仍然很重要:
In [24]: txt = txt*10000
In [25]: %timeit re.findall(r'([^.]*apple[^.]*)', txt)
100 loops, best of 3: 8.49 ms per loop
In [26]: %timeit [s+'.' for s in txt.split('.') if 'apple' in s]
100 loops, best of 3: 6.35 ms per loop
答案 3 :(得分:3)
您可以使用str.split,
>>> txt="I like to eat apple. Me too. Let's go buy some apples."
>>> txt.split('. ')
['I like to eat apple', 'Me too', "Let's go buy some apples."]
>>> [ t for t in txt.split('. ') if 'apple' in t]
['I like to eat apple', "Let's go buy some apples."]
答案 4 :(得分:2)
r"\."+".+"+"apple"+".+"+"\."
这条线有点奇怪;为什么连接这么多单独的字符串?你可以使用r'.. + apple。+。'。
无论如何,正则表达式的问题在于它的贪婪。默认情况下,x+
会尽可能多地匹配x
。因此,.+
将匹配尽可能多的字符(任何字符);包括点和apple
s。
你想要使用的是一种非贪婪的表达;您通常可以在最后添加?
来完成此操作:.+?
。
这将使您获得以下结果:
['.I like to eat apple. Me too.']
你可以看到你不再同时获得苹果句子,但仍然是Me too.
。这是因为您仍然匹配.
之后的apple
,因此无法捕获以下句子。
正常运作的正则表达式为:r'\.[^.]*?apple[^.]*?\.'
在这里,您不会查看任何字符,而只会查看那些本身不是点的字符。我们也允许不匹配任何字符(因为在第一句中的apple
之后没有非点字符)。使用该表达式得出:
['.I like to eat apple.', ". Let's go buy some apples."]
答案 5 :(得分:0)
显然,有问题的样本是extract sentence containing word
而不是
extract sentence containing word
。如何通过python解决def searchWordinSentence(word,sentence):
pattern = re.compile(' '+word+' |^'+word+' | '+word+' $')
if re.search(pattern,sentence):
return True
问题如下:
一个单词可以在句子的开头|中间。不限于问题中的示例,我将提供在句子中搜索单词的一般功能:
txt="I like to eat apple. Me too. Let's go buy some apples."
word = "apple"
print [ t for t in txt.split('. ') if searchWordofSentence(word,t)]
仅限于问题中的示例,我们可以解决如下:
['I like to eat apple']
相应的输出是:
{{1}}