我试图匹配' ephname'下面的表达式(取决于存在哪个文件),但我只想要捕获数字:
entries = ['other data\nephdelay = 12\nephname = cfghjk78.comb\nother data', 'other data\nephdelay = 17\nephname = qwerty.s92\nother data']
我使用它作为我的正则表达式,但没有匹配出现(但是如果我选择一个并删除布尔值,它只适用于一个):
\s?ephname\s?=\s?.*?\.s(\d+)\s?|\s?ephname\s?=\s?.*?(\d+)\.comb\s?
我在正则表达式网站上对此进行了测试,但我不知道问题所在。我希望输出为' 94'或者' 78'取决于条目。为什么我没有得到任何比赛?
编辑: 在我的代码中我有这个:
import re
commonterms = (["term1", "#term1pattern"],
["ephsol", "\s?ephname\s?=\s?.*?\.s(\d+)\s?|\s?ephname\s?=\s?.*?(\d+)\.comb\s?"],
["term3", "#term3pattern"], ...)
terms = [commonterms[i][0] for i in range(len(commonterms))]
patterns = [commonterms[i][1] for i in range(len(commonterms))]
d = {t: [] for t in terms}
def getTerms(entry):
for i in range(len(terms)):
term = re.search(patterns[i], entry)
term = term.groups()[0] if term else 'NULL'
return d
for entry in entries:
d = getTerms(entry)
print d['ephsol']
然后当我打印d['ephsol']
时,我只得到一堆NULL,但我知道应该有匹配。
答案 0 :(得分:3)
您遇到的问题是您在两个不同的组中有一个匹配:
entries = ['other data\nephdelay = 12\nephname = cfghjk78.comb\nother data', 'other data\nephdelay = 17\nephname = qwerty.s92\nother data']
for e in entries:
m=re.search(r'\s?ephname\s?=\s?.*?\.s(\d+)\s?|\s?ephname\s?=\s?.*?(\d+)\.comb\s?', e)
if m:
print "Group 1: {}, Group 2: {} {}".format(m.group(1), m.group(2), m.groups())
打印:
Group 1: None, Group 2: 78 (None, '78')
Group 1: 92, Group 2: None ('92', None)
要打印任何一个,您可以这样做:
for e in entries:
m=re.search(r'\s?ephname\s?=\s?.*?\.s(\d+)\s?|\s?ephname\s?=\s?.*?(\d+)\.comb\s?', e)
if m:
print m.group(1) if m.group(1) is not None else m.group(2)
打印:
78
92
另一种选择是change your regex,以便捕获始终在组1中:
for e in entries:
m=re.search(r'^ephname[ \t]*=[ \t]*[^0-9\n]*(\d+)(?:\.comb|\s)', e, flags=re.M)
if m:
print m.group(1)
打印:
78
92
答案 1 :(得分:2)
我认为你需要消除一些你的'?'人物,因为他们搞砸了比赛。 ''必须至少有一场比赛。此外,它有助于使用re.DOTALL,以便'。'字符包括新行(' \ n')。
以下是我提出的建议:
import re
entries = ['other data\nephdelay = 12\nephname = cfghjk78.comb\nother data', 'other data\nephdelay = 17\nephname = qwerty.s92\nother data']
pattern = '.*ephname\s*=\s*[a-zA-Z\.]*(\d+)\s*|.*ephname\s*=\s*[a-zA-Z\.]*(\d+)\.comb\s*'
pObj = re.compile(pattern, re.DOTALL)
match = pObj.match(entries[0])
match2 = pObj.match(entries[1])
print(match.group(1))
print("**********divider")
print(match2.group(1))
print("\n\nReprinting the input date\n\n")
print(entries[0])
print(entries[1])
答案 2 :(得分:0)
此答案基于已编辑的问题。
getTerms
功能中的分配不正确。您只分配第一个0
组,但匹配可以在第二个组中。此外,您将返回d
,而您不会更改。
以下是如何做到的:
# anything above this line was not changed
d = {}
# d is: term - list of matches
for i in range(len(terms)):
d[terms[i]] = []
# for each entry
for entry in entries:
# for each term
for i in range(len(terms)):
# get match
m = re.search(patterns[i], entry)
# matched
if m:
# check for group 1
if m.group(1):
# add match to the term list
d[terms[i]].append(m.group(1))
# check for group 2
elif m.group(2):
# add match to the term list
d[terms[i]].append(m.group(2))
# did not match
else:
# add null to the term list
d[terms[i]].append('NULL')
print d
print
print d['ephsol']
输出
{'ephsol': ['78', '92'], 'term3': ['NULL', 'NULL'], 'term1': ['NULL', 'NULL']}
['78', '92']