我有以下文字:
'- `Popen.``terminate`()\n\n Stop the child. On Posix OSs the method sends SIGTERM to the child. On Windows the Win32 API function `TerminateProcess()` is called to stop the child.\n\n\n- `Popen.``kill`()\n\n Kills the child. On Posix OSs the function sends SIGKILL to the child. On Windows;...
我尝试从文本中提取列表
In [46]: pattern = re.compile(r'-\s(.+)\n\n')
In [49]: matches = pattern.findall(content)
In [50]: matches
Out[50]:
['`Popen.``terminate`()',
'`Popen.``kill`()',
'`Popen.``args`',
'`Popen.``stdin`',
'`Popen.``stdout`']
我想要的结果是
['Popen.terminate()',
'Popen.kill()',
'Popen.args',
'Popen.stdin',
'Popen.stdout']
我用两组来改变reges以捕获有资格的部分
In [55]: pattern2 = re.compile(r'- `(\w+).``(\w+.*)`')
In [64]: matches = pattern2.findall(content)
In [65]: matches
Out[65]:
[('Popen', 'terminate'),
('Popen', 'kill'),
('Popen', 'args'),
('Popen', 'stdin'),
('Popen', 'stdout')]
它仍然不是我想要的结果。
如何解决问题?
答案 0 :(得分:0)
-\s`([^`]*)``([^`]*)`((?:\(\))?)\n\n
import re
r = re.compile(r"-\s`([^`]*)``([^`]*)`((?:\(\))?)\n\n")
s = ("'- `Popen.``terminate`()\n\n"
" Stop the child. On Posix OSs the method sends SIGTERM to the child. On Windows the Win32 API function `TerminateProcess()` is called to stop the child.\n\n\n"
"- `Popen.``kill`()\n\n"
" Kills the child. On Posix OSs the function sends SIGKILL to the child. On Windows;...\n")
for m in re.finditer(r, s):
print m.group(1) + m.group(2) + m.group(3)
'- `Popen.``terminate`()\n\n Stop the child. On Posix OSs the method sends SIGTERM to the child. On Windows the Win32 API function `TerminateProcess()` is called to stop the child.\n\n\n- `Popen.``kill`()\n\n Kills the child. On Posix OSs the function sends SIGKILL to the child. On Windows;...
注意:下面的输出与OP的预期输出不匹配,因为OP没有发布完整字符串,只发布了部分字符串。
Popen.terminate()
Popen.kill()
-
字面上匹配连字符-
\s
匹配空白字符`
字面上匹配严重的重音字符([^`]*)
捕获集合中不存在的任何数字(除了严重重音字符`
之外的任何字符)到捕获组1 ``
按字面意思匹配两个严重的重音字符([^`]*)
捕获集合中不存在的任何数字(除了严重重音字符`
之外的任何字符)到捕获组2中`
字面上匹配严重的重音字符((?:\(\))?)
将以下内容捕获到捕获组3中
(?:\(\))?
匹配以下零次或一次
\(\)
按字面意思匹配开括号和右括号()
\n\n
匹配两个换行符