分别从文本中提取元素

时间:2017-12-06 14:46:42

标签: regex

我有以下文字:

'- `Popen.``terminate`()\n\n  Stop the child. On Posix OSs the method sends SIGTERM to the child. On Windows the Win32 API function `TerminateProcess()` is called to stop the child.\n\n\n- `Popen.``kill`()\n\n  Kills the child. On Posix OSs the function sends SIGKILL to the child. On Windows;...

我尝试从文本中提取列表

In [46]: pattern = re.compile(r'-\s(.+)\n\n')
In [49]: matches = pattern.findall(content)
In [50]: matches
Out[50]:
['`Popen.``terminate`()',
 '`Popen.``kill`()',
 '`Popen.``args`',
 '`Popen.``stdin`',
 '`Popen.``stdout`']

我想要的结果是

['Popen.terminate()',
 'Popen.kill()',
 'Popen.args',
 'Popen.stdin',
 'Popen.stdout']

我用两组来改变reges以捕获有资格的部分

In [55]: pattern2 = re.compile(r'- `(\w+).``(\w+.*)`')
In [64]: matches = pattern2.findall(content)
In [65]: matches
Out[65]:
[('Popen', 'terminate'),
 ('Popen', 'kill'),
 ('Popen', 'args'),
 ('Popen', 'stdin'),
 ('Popen', 'stdout')]

它仍然不是我想要的结果。

如何解决问题?

1 个答案:

答案 0 :(得分:0)

代码

See regex in use here

-\s`([^`]*)``([^`]*)`((?:\(\))?)\n\n

用法

See code in use here

import re

r = re.compile(r"-\s`([^`]*)``([^`]*)`((?:\(\))?)\n\n")

s = ("'- `Popen.``terminate`()\n\n"
    "  Stop the child. On Posix OSs the method sends SIGTERM to the child. On Windows the Win32 API function `TerminateProcess()` is called to stop the child.\n\n\n"
    "- `Popen.``kill`()\n\n"
    "  Kills the child. On Posix OSs the function sends SIGKILL to the child. On Windows;...\n")

for m in re.finditer(r, s):
    print m.group(1) + m.group(2) + m.group(3)

结果

输入

'- `Popen.``terminate`()\n\n  Stop the child. On Posix OSs the method sends SIGTERM to the child. On Windows the Win32 API function `TerminateProcess()` is called to stop the child.\n\n\n- `Popen.``kill`()\n\n  Kills the child. On Posix OSs the function sends SIGKILL to the child. On Windows;...

输出

注意:下面的输出与OP的预期输出不匹配,因为OP没有发布完整字符串,只发布了部分字符串。

Popen.terminate()
Popen.kill()

说明

  • -字面上匹配连字符-
  • \s匹配空白字符
  • `字面上匹配严重的重音字符
  • ([^`]*)捕获集合中不存在的任何数字(除了严重重音字符`之外的任何字符)到捕获组1
  • ``按字面意思匹配两个严重的重音字符
  • ([^`]*)捕获集合中不存在的任何数字(除了严重重音字符`之外的任何字符)到捕获组2中
  • `字面上匹配严重的重音字符
  • ((?:\(\))?)将以下内容捕获到捕获组3中
    • (?:\(\))?匹配以下零次或一次
      • \(\)按字面意思匹配开括号和右括号()
  • \n\n匹配两个换行符