使用正则表达式'或'并在同一时间捕获群组

时间:2017-09-06 23:55:09

标签: python regex

我试图匹配' ephname'下面的表达式(取决于存在哪个文件),但我只想要捕获数字:

entries = ['other data\nephdelay = 12\nephname = cfghjk78.comb\nother data', 'other data\nephdelay = 17\nephname = qwerty.s92\nother data']

我使用它作为我的正则表达式,但没有匹配出现(但是如果我选择一个并删除布尔值,它只适用于一个):

\s?ephname\s?=\s?.*?\.s(\d+)\s?|\s?ephname\s?=\s?.*?(\d+)\.comb\s?

我在正则表达式网站上对此进行了测试,但我不知道问题所在。我希望输出为' 94'或者' 78'取决于条目。为什么我没有得到任何比赛?

编辑: 在我的代码中我有这个:

import re
commonterms = (["term1", "#term1pattern"],
               ["ephsol", "\s?ephname\s?=\s?.*?\.s(\d+)\s?|\s?ephname\s?=\s?.*?(\d+)\.comb\s?"],
               ["term3", "#term3pattern"], ...)

terms = [commonterms[i][0] for i in range(len(commonterms))]
patterns = [commonterms[i][1] for i in range(len(commonterms))]

d = {t: [] for t in terms}

def getTerms(entry):
    for i in range(len(terms)):
        term = re.search(patterns[i], entry)
        term = term.groups()[0] if term else 'NULL'
    return d

for entry in entries:
    d = getTerms(entry)

print d['ephsol']

然后当我打印d['ephsol']时,我只得到一堆NULL,但我知道应该有匹配。

3 个答案:

答案 0 :(得分:3)

您遇到的问题是您在两个不同的组中有一个匹配:

entries = ['other data\nephdelay = 12\nephname = cfghjk78.comb\nother data', 'other data\nephdelay = 17\nephname = qwerty.s92\nother data']

for e in entries:
    m=re.search(r'\s?ephname\s?=\s?.*?\.s(\d+)\s?|\s?ephname\s?=\s?.*?(\d+)\.comb\s?', e)
    if m:
        print "Group 1: {}, Group 2: {} {}".format(m.group(1), m.group(2), m.groups())

打印:

Group 1: None, Group 2: 78 (None, '78')
Group 1: 92, Group 2: None ('92', None)

要打印任何一个,您可以这样做:

for e in entries:
    m=re.search(r'\s?ephname\s?=\s?.*?\.s(\d+)\s?|\s?ephname\s?=\s?.*?(\d+)\.comb\s?', e)
    if m:
        print m.group(1) if m.group(1) is not None else m.group(2)

打印:

78
92

另一种选择是change your regex,以便捕获始终在组1中:

for e in entries:
    m=re.search(r'^ephname[ \t]*=[ \t]*[^0-9\n]*(\d+)(?:\.comb|\s)', e, flags=re.M)
    if m:
        print m.group(1)

打印:

78
92

答案 1 :(得分:2)

我认为你需要消除一些你的'?'人物,因为他们搞砸了比赛。 ''必须至少有一场比赛。此外,它有助于使用re.DOTALL,以便'。'字符包括新行(' \ n')。

以下是我提出的建议:

import re

entries = ['other data\nephdelay = 12\nephname = cfghjk78.comb\nother data', 'other data\nephdelay = 17\nephname = qwerty.s92\nother data']

pattern = '.*ephname\s*=\s*[a-zA-Z\.]*(\d+)\s*|.*ephname\s*=\s*[a-zA-Z\.]*(\d+)\.comb\s*'

pObj = re.compile(pattern, re.DOTALL)

match = pObj.match(entries[0])

match2 = pObj.match(entries[1])

print(match.group(1))
print("**********divider")
print(match2.group(1))


print("\n\nReprinting the input date\n\n")
print(entries[0])
print(entries[1])

答案 2 :(得分:0)

此答案基于已编辑的问题。

getTerms功能中的分配不正确。您只分配第一个0组,但匹配可以在第二个组中。此外,您将返回d,而您不会更改。

以下是如何做到的:

# anything above this line was not changed

d = {}

# d is: term - list of matches
for i in range(len(terms)):
  d[terms[i]] = []

# for each entry
for entry in entries:

  # for each term
  for i in range(len(terms)):

    # get match
    m = re.search(patterns[i], entry)

    # matched
    if m:

      # check for group 1
      if m.group(1):
        # add match to the term list
        d[terms[i]].append(m.group(1))

      # check for group 2
      elif m.group(2):
        # add match to the term list
        d[terms[i]].append(m.group(2))

    # did not match
    else:
      # add null to the term list
      d[terms[i]].append('NULL')

print d
print
print d['ephsol']

输出

{'ephsol': ['78', '92'], 'term3': ['NULL', 'NULL'], 'term1': ['NULL', 'NULL']}

['78', '92']

https://repl.it/Ko2L/0