Question

给出例如段落

。这是图3a。这是图4a。我喜欢（图5）。这很重要（图6a）。

我想要一个python正则表达式来提取基于图号的句子。我正在尝试

这是图3a使用（[^。] *？fig。 3 [^。] 。）
这图4a（[^。] *？图。 4 [^。] 。）
我喜欢（图5）（[^。] *？fig。 5 [^。] 。）
这很重要（图6a）（[^。] *？图。 6 [^。] 。）

但匹配并不具体。例如，数字4将提取所有数字。我只是一个基于图号

Answer 1

你需要替换，

.*

4

[^.]*
将4替换为\d

代码：

In[3]: s = "This is figure 3a. This is fig 4a . I like (figure 5). This is important (fig 6a)."
In[4]: import re
In[5]: re.findall(r'[^.]*?fig[^.]*\d[^.]*', s)
Out[5]: 
['This is figure 3a',
 ' This is fig 4a ',
 ' I like (figure 5)',
 ' This is important (fig 6a)']

或

In[8]: re.findall(r'\s*([^.]*?fig[^.]*\d[^.]*?)(?=\s*\.)', s)
Out[8]: 
['This is figure 3a',
 'This is fig 4a',
 'I like (figure 5)',
 'This is important (fig 6a)']

Python正则表达式匹配句子中的模式

1 个答案: