我有一个文件(test.txt),内容如下:
I0914 17:37:15.763941 29832 abc.cpp:138] Iteration 0, Testing (#0)
I0922 16:14:14.933842 2057 abc.cpp:176] Test score #0: 0.146329
I0914 17:37:15.763941 29832 abc.cpp:138] Iteration 1000, Testing (#0)
I0922 16:14:14.933842 2057 abc.cpp:176] Test score #0: 0.246222
I0914 17:37:15.763941 29832 abc.cpp:138] Iteration 2000, Testing (#0)
I0922 16:14:14.933842 2057 abc.cpp:176] Test score #0: 0.335429
I0914 17:37:15.763941 29832 abc.cpp:138] Iteration 3000, Testing (#0)
I0922 16:14:14.933842 2057 abc.cpp:176] Test score #0: 0.445429
I0914 17:37:15.763941 29832 abc.cpp:138] Iteration 4000, Testing (#0)
I0922 16:14:14.933842 2057 abc.cpp:176] Test score #0: 0.546429
我的问题是如何获得迭代次数(0,1000,2000 ......,4000)和测试分数(0.146329,0.246222,0.335429 ......,0.546429)并将它们组合成dict。
例如,我的预期结果如下:
dict = {'0':0.146329,
'1000':0.246222
'2000':0.335429
'3000':0.445429
'4000':0.546429}
提前致谢。
答案 0 :(得分:1)
iter = 0
for line in file:
itermatch = re.search('Iteration \d+',line)
if itermatch:
iter = itermatch.group()
else:
scorematch = re.search(': [0-9.]+',line)
if scorematch:
dict[iter]= scorematch.group()
答案 1 :(得分:0)
这是一种不使用正则表达式的方法:
result = {}
with open('test.txt') as in_file:
for line in in_file:
data = line.strip().split('] ')[1]
if ',' in data:
key = data.split(',')[0]
key = key.split(' ')[1]
else:
val = (data.split(':')[1]).strip()
print val
result[key] = val
这给出了:
{'0': '0.146329',
'1000': '0.246222',
'2000': '0.335429',
'3000': '0.445429',
'4000': '0.546429'}
答案 2 :(得分:0)
(?<=Iteration\s)(\d+)|(?<=Test score\s#0:\s)(\S+)
你可以使用这个正则表达式。只需抓住比赛并使用它。
参见演示。