Question

我试图匹配＆＃39; ephname＆＃39;下面的表达式（取决于存在哪个文件），但我只想要捕获数字：

entries = ['other data\nephdelay = 12\nephname = cfghjk78.comb\nother data', 'other data\nephdelay = 17\nephname = qwerty.s92\nother data']

我使用它作为我的正则表达式，但没有匹配出现（但是如果我选择一个并删除布尔值，它只适用于一个）：

\s?ephname\s?=\s?.*?\.s(\d+)\s?|\s?ephname\s?=\s?.*?(\d+)\.comb\s?

我在正则表达式网站上对此进行了测试，但我不知道问题所在。我希望输出为＆＃39; 94＆＃39;或者＆＃39; 78＆＃39;取决于条目。为什么我没有得到任何比赛？

编辑：在我的代码中我有这个：

import re
commonterms = (["term1", "#term1pattern"],
               ["ephsol", "\s?ephname\s?=\s?.*?\.s(\d+)\s?|\s?ephname\s?=\s?.*?(\d+)\.comb\s?"],
               ["term3", "#term3pattern"], ...)

terms = [commonterms[i][0] for i in range(len(commonterms))]
patterns = [commonterms[i][1] for i in range(len(commonterms))]

d = {t: [] for t in terms}

def getTerms(entry):
    for i in range(len(terms)):
        term = re.search(patterns[i], entry)
        term = term.groups()[0] if term else 'NULL'
    return d

for entry in entries:
    d = getTerms(entry)

print d['ephsol']

然后当我打印d['ephsol']时，我只得到一堆NULL，但我知道应该有匹配。

Answer 1

您遇到的问题是您在两个不同的组中有一个匹配：

entries = ['other data\nephdelay = 12\nephname = cfghjk78.comb\nother data', 'other data\nephdelay = 17\nephname = qwerty.s92\nother data']

for e in entries:
    m=re.search(r'\s?ephname\s?=\s?.*?\.s(\d+)\s?|\s?ephname\s?=\s?.*?(\d+)\.comb\s?', e)
    if m:
        print "Group 1: {}, Group 2: {} {}".format(m.group(1), m.group(2), m.groups())

打印：

Group 1: None, Group 2: 78 (None, '78')
Group 1: 92, Group 2: None ('92', None)

要打印任何一个，您可以这样做：

for e in entries:
    m=re.search(r'\s?ephname\s?=\s?.*?\.s(\d+)\s?|\s?ephname\s?=\s?.*?(\d+)\.comb\s?', e)
    if m:
        print m.group(1) if m.group(1) is not None else m.group(2)

打印：

78
92

另一种选择是change your regex，以便捕获始终在组1中：

for e in entries:
    m=re.search(r'^ephname[ \t]*=[ \t]*[^0-9\n]*(\d+)(?:\.comb|\s)', e, flags=re.M)
    if m:
        print m.group(1)

打印：

78
92

Answer 2

我认为你需要消除一些你的＆＃39;？＆＃39;人物，因为他们搞砸了比赛。＆＃39;＆＃39;必须至少有一场比赛。此外，它有助于使用re.DOTALL，以便＆＃39;。＆＃39;字符包括新行（＆＃39; \ n＆＃39;）。

以下是我提出的建议：

import re

entries = ['other data\nephdelay = 12\nephname = cfghjk78.comb\nother data', 'other data\nephdelay = 17\nephname = qwerty.s92\nother data']

pattern = '.*ephname\s*=\s*[a-zA-Z\.]*(\d+)\s*|.*ephname\s*=\s*[a-zA-Z\.]*(\d+)\.comb\s*'

pObj = re.compile(pattern, re.DOTALL)

match = pObj.match(entries[0])

match2 = pObj.match(entries[1])

print(match.group(1))
print("**********divider")
print(match2.group(1))


print("\n\nReprinting the input date\n\n")
print(entries[0])
print(entries[1])

Answer 3

此答案基于已编辑的问题。

getTerms功能中的分配不正确。您只分配第一个0组，但匹配可以在第二个组中。此外，您将返回d，而您不会更改。

以下是如何做到的：

# anything above this line was not changed

d = {}

# d is: term - list of matches
for i in range(len(terms)):
  d[terms[i]] = []

# for each entry
for entry in entries:

  # for each term
  for i in range(len(terms)):

    # get match
    m = re.search(patterns[i], entry)

    # matched
    if m:

      # check for group 1
      if m.group(1):
        # add match to the term list
        d[terms[i]].append(m.group(1))

      # check for group 2
      elif m.group(2):
        # add match to the term list
        d[terms[i]].append(m.group(2))

    # did not match
    else:
      # add null to the term list
      d[terms[i]].append('NULL')

print d
print
print d['ephsol']

输出

{'ephsol': ['78', '92'], 'term3': ['NULL', 'NULL'], 'term1': ['NULL', 'NULL']}

['78', '92']

https://repl.it/Ko2L/0

使用正则表达式＆＃39;或＆＃39;并在同一时间捕获群组

3 个答案: