在文件python中找到某些单词

时间:2014-09-23 02:42:22

标签: python regex

我有一个文件(test.txt),内容如下:

I0914 17:37:15.763941 29832 abc.cpp:138] Iteration 0, Testing (#0)
I0922 16:14:14.933842  2057 abc.cpp:176] Test score #0: 0.146329
I0914 17:37:15.763941 29832 abc.cpp:138] Iteration 1000, Testing (#0)
I0922 16:14:14.933842  2057 abc.cpp:176] Test score #0: 0.246222
I0914 17:37:15.763941 29832 abc.cpp:138] Iteration 2000, Testing (#0)
I0922 16:14:14.933842  2057 abc.cpp:176] Test score #0: 0.335429
I0914 17:37:15.763941 29832 abc.cpp:138] Iteration 3000, Testing (#0)
I0922 16:14:14.933842  2057 abc.cpp:176] Test score #0: 0.445429
I0914 17:37:15.763941 29832 abc.cpp:138] Iteration 4000, Testing (#0)
I0922 16:14:14.933842  2057 abc.cpp:176] Test score #0: 0.546429

我的问题是如何获得迭代次数(0,1000,2000 ......,4000)和测试分数(0.146329,0.246222,0.335429 ......,0.546429)并将它们组合成dict。

例如,我的预期结果如下:

dict = {'0':0.146329,
        '1000':0.246222
        '2000':0.335429
        '3000':0.445429
        '4000':0.546429}

提前致谢。

3 个答案:

答案 0 :(得分:1)

iter = 0
for line in file:
  itermatch = re.search('Iteration \d+',line)
  if itermatch:
    iter = itermatch.group()
  else:
    scorematch = re.search(': [0-9.]+',line)
    if scorematch:
      dict[iter]= scorematch.group()

答案 1 :(得分:0)

这是一种不使用正则表达式的方法:

result = {}
with open('test.txt') as in_file:
    for line in in_file:
        data = line.strip().split('] ')[1]
        if ',' in data:
            key = data.split(',')[0]
            key = key.split(' ')[1]
        else:
            val = (data.split(':')[1]).strip()
            print val
            result[key] = val

这给出了:

{'0': '0.146329',
 '1000': '0.246222',
 '2000': '0.335429',
 '3000': '0.445429',
 '4000': '0.546429'}

答案 2 :(得分:0)

(?<=Iteration\s)(\d+)|(?<=Test score\s#0:\s)(\S+)

你可以使用这个正则表达式。只需抓住比赛并使用它。

参见演示。

http://regex101.com/r/kM7rT8/16