Question

我正在使用python逐行搜索文件以查找部分和子部分。

   *** Section with no sub section
  *** Section with sub section ***
           *** Sub Section ***
  *** Another section

部分以0-2个空格开头，后跟三个星号，子部分有2个以上的空格，然后是星号。

我写了部分/子部分没有＆＃34; ***＆＃39; s＆＃34 ;;目前（使用re.sub）。

Section: Section with no sub section
Section: Section with sub section
Sub-Section: Sub Section
Section: Another Section

问题1 ：是否有一个带有捕获组的python regexp，可以让我将段/子部分名称作为捕获组进行访问？

问题2 ：正则表达式组如何允许我识别ID部分或子部分（可能基于match.group中的/ content的数量）？

示例（非工作）：

match=re.compile('(group0 *** )(group1 section title)(group2 ***)')
sectionTitle = match.group(1)
if match.lastindex = 0: sectionType = section with no subs
if match.lastindex = 1: sectionType = section with subs
if match.lastindex = 2: sectionTpe = sub section

以前的尝试 我已经能够使用单独的regexp和if语句捕获部分或子部分，但我想一次完成所有操作。像下面的一行;第二组贪婪有困难。

'(^\*{3}\s)(.*)(\s\*{3}$)'

我似乎无法让贪婪或可选群体一起工作。 http://pythex.org/对此非常有帮助。

另外，我尝试捕捉星号＆＃39;（* {3}）＆＃39;然后根据找到的组数确定部分或子部分。

sectionRegex=re.compile('(\*{3})'
m=re.search(sectionRegex)
  if m.lastindex == 0:
       sectionName = re.sub(sectionRegex,'',line) 
       #Set a section flag
  if m.lastindex ==1:
       sectionName = re.sub(sectionRegex,''line)
       #Set a sub section flag.

感谢也许我完全错了。任何帮助表示赞赏。

最新更新 我一直在玩Pythex，答案和其他研究。我现在花更多的时间来捕捉这些词：

^[a-zA-Z]+$

并计算星号匹配的数量以确定＆＃34;等级＆＃34;。我仍在寻找一个正则表达式来匹配两个 - 三个＆＃34;组＆＃34;。可能不存在。

感谢。

Answer 1

问题1 ：是否有带捕获组的python正则表达式   让我将部分/子部分名称作为捕获组来访问？


单个正则表达式匹配两个 - 三个＆＃34;组＆＃34;。可能不存在

是的，可以做到。我们可以将条件解压缩为以下树：

行首 + 0到2个空格
2个替换中的任何一个：
1. *** + 任何文字 ^{[group 1]}
2. 1+个空格 + *** + 任何文字 ^[第2组]
*** ^（可选） + 行尾

上面的树可以用模式表示：

^[ ]{0,2}(?:[*]{3}(.*?)|[ ]+[*]{3}(.*?))(?:[*]{3})?$

regex101 DEMO

请注意 Section 和 Sub-Section 正被不同的组捕获（^{[group 1]}和^{[group 2]}分别）。它们都使用相同的语法.*?，两者都使用lazy quantifier (the extra "?")，以允许末尾的可选"***"匹配。

问题2 ：正则表达式组如何允许我使用ID部分或子节（可能基于match.group中的/内容的数量）？

上述正则表达式仅在第1组中捕获部分，仅在第2组中捕获子部分。为了便于在代码中识别，我＆＃39 ; ll使用(?P<named> groups)并使用 .groupdict() 检索捕获。

代码：

import re

data = """  *** Section with no sub section
  *** Section with sub section ***
           *** Sub Section ***
  *** Another section"""

pattern = r'^[ ]{0,2}(?:[*]{3}[ ]?(?P<Section>.*?)|[ ]+[*]{3}[ ]?(?P<SubSection>.*?))(?:[ ]?[*]{3})?$'
regex = re.compile(pattern, re.M)

for match in regex.finditer(data):
    print(match.groupdict())

''' OUTPUT:
{'Section': 'Section with no sub section', 'SubSection': None}
{'Section': 'Section with sub section', 'SubSection': None}
{'Section': None, 'SubSection': 'Sub Section'}
{'Section': 'Another section', 'SubSection': None}
'''

ideone DEMO

您可以使用以下其中一项来代替打印dict，以引用每个 Section / Subsection ：

match.group("Section")
match.group(1)
match.group("SubSection")
match.group(2)

Answer 2

假设你的小节有3个以上的空格，你可以这样做：

import re

data = '''
  *** Section with no sub section
*** Section with sub section ***
           *** Sub Section ***
 *** Another section
'''

pattern = r'(?:(^ {0,2}\*{3}.*\*{3} *$)|(^ {0,2}\*{3}.*)|(^ *\*{3}.*\*{3} *$))'

regex = re.compile(pattern, re.M)
print regex.findall(data)

这将为您提供如下组：

[('', '  *** Section with no sub section', ''),
 ('*** Section with sub section ***', '', ''),
 ('', '', '           *** Sub Section ***'),
 ('', ' *** Another section', '')]

Answer 3

正则表达式：

point3D findIntersect(const vec3& vector1, const vec3& vector2) {
    double ignore;
    return findIntersect(vector1, vector2, ignore);
}
point3D findIntersect(const vec3& vector1, const vec3& vector2, double &length) {
    ... // Implementation goes here
}

如下所述捕获3组或4组。

(^\s+)(\*{3})([a-zA-Z\s]+)(\*{3})*

Python正则表达式可选捕获组或lastindex

3 个答案:

代码：