Question

我有一个文件（test.txt），内容如下：

I0914 17:37:15.763941 29832 abc.cpp:138] Iteration 0, Testing (#0)
I0922 16:14:14.933842  2057 abc.cpp:176] Test score #0: 0.146329
I0914 17:37:15.763941 29832 abc.cpp:138] Iteration 1000, Testing (#0)
I0922 16:14:14.933842  2057 abc.cpp:176] Test score #0: 0.246222
I0914 17:37:15.763941 29832 abc.cpp:138] Iteration 2000, Testing (#0)
I0922 16:14:14.933842  2057 abc.cpp:176] Test score #0: 0.335429
I0914 17:37:15.763941 29832 abc.cpp:138] Iteration 3000, Testing (#0)
I0922 16:14:14.933842  2057 abc.cpp:176] Test score #0: 0.445429
I0914 17:37:15.763941 29832 abc.cpp:138] Iteration 4000, Testing (#0)
I0922 16:14:14.933842  2057 abc.cpp:176] Test score #0: 0.546429

我的问题是如何获得迭代次数（0,1000,2000 ......，4000）和测试分数（0.146329,0.246222,0.335429 ......，0.546429）并将它们组合成dict。

例如，我的预期结果如下：

dict = {'0':0.146329,
        '1000':0.246222
        '2000':0.335429
        '3000':0.445429
        '4000':0.546429}

提前致谢。

Answer 1

iter = 0
for line in file:
  itermatch = re.search('Iteration \d+',line)
  if itermatch:
    iter = itermatch.group()
  else:
    scorematch = re.search(': [0-9.]+',line)
    if scorematch:
      dict[iter]= scorematch.group()

Answer 2

这是一种不使用正则表达式的方法：

result = {}
with open('test.txt') as in_file:
    for line in in_file:
        data = line.strip().split('] ')[1]
        if ',' in data:
            key = data.split(',')[0]
            key = key.split(' ')[1]
        else:
            val = (data.split(':')[1]).strip()
            print val
            result[key] = val

这给出了：

{'0': '0.146329',
 '1000': '0.246222',
 '2000': '0.335429',
 '3000': '0.445429',
 '4000': '0.546429'}

Answer 3

(?<=Iteration\s)(\d+)|(?<=Test score\s#0:\s)(\S+)

你可以使用这个正则表达式。只需抓住比赛并使用它。

参见演示。

http://regex101.com/r/kM7rT8/16

在文件python中找到某些单词

3 个答案: