我正在从 pdf 中提取文本,我想搜索像 P50+P60 这样的表达式,但是在文本中也有像 P50+P40+P30 这样的术语>. 我该如何实现,我只是找到了一个像 Pxx+Pxx(x=digit)这样的结构,但没有找到 Pxx+Pxx+Pxx。
我是这样试的
List = re.findall('(P\d\d+P\d\d[^\+P\d\d])', String)
但这也显示了词条 P50+P40+P30 中的 P50+P40。 我尝试了很多,但无法解决问题。
答案 0 :(得分:1)
使用
re.findall(r'(?<!P\d\d\+)P\d\d\+P\d\d(?!\+P\d\d)', String)
说明
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
P 'P'
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
\+ '+'
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
P 'P'
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
\+ '+'
--------------------------------------------------------------------------------
P 'P'
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\+ '+'
--------------------------------------------------------------------------------
P 'P'
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
) end of look-ahead