解析文本文件并对某些值进行分组

时间:2013-06-13 11:10:47

标签: python python-2.7

我正在尝试读取文本文件并以特定方式解析它,然后用输出重写文件。

文本文件(输入)如下所示:

2 108 1 561 1 20 28 1
2 108 2 557 1 24 32 1
5 28 1 553 197 20 20 1
5 28 2 552 197 23 21 1
6 23 1 113 393 36 36 1
6 23 2 113 391 39 39 1

每列代表一个特定值:

[ID] [Length] [Frame] [X] [Y] [W] [H]

举个例子,这一行:

2 108 1 561 1 20 28 1

实际上是:ID:2, Length:108, Frame:1, X:561, Y:1, W:20, Y:28

根本不需要最后一个值1

现在,我到目前为止是这样做的:

with open('1.txt') as fin:
    frame_rects = {}
    for row in (map(int, line.split()) for line in fin):
        id, frame, rect = row[0], row[2], row[3:7]
        frame_rects[frame] = (id, rect)
        first_data = ('{} {} {}\n'.format(frame, id, rect))
        print first_data

这会输出以下内容:

1 2 [561, 1, 20, 28]

2 2 [557, 1, 24, 32]

1 5 [553, 197, 20, 20]

2 5 [552, 197, 23, 21]

1 6 [113, 393, 36, 36]

2 6 [113, 391, 39, 39]

这是第一步,但我的预期输出如下:

1 2 [561, 1, 20, 28] 5 [553, 197, 20, 20] 6 [113, 393, 36, 36]
2 2 [557, 1, 24, 32] 5 [552, 197, 23, 21] 6 [113, 391, 39, 39]

因此,对于每一帧,我都会附加出现在该特定帧中的所有ID及其值。

所以在第1帧中,id 2,5和6每个都有自己的值(x,y,w,h)。

每个帧键都是唯一的,但只要它们实际出现在该帧中,就可以根据需要保存尽可能多的ID +值。

我需要在可能包含数千个文件的文本文件上运行此文件。每帧可以容纳~20个不同的ID。我将如何实现预期的产出?

2 个答案:

答案 0 :(得分:3)

from collections import defaultdict
with open('abc') as f:
    dic = defaultdict(list)
    for line in f:
        idx, lenght, frame, X, Y, W, H, _ = map(int, line.split())
        dic[frame].append([idx, [X, Y, W, H] ])
print dic
print "Expected output:"
for k, v in dic.items():
    print "{} {}".format(k, "".join(["{} {} ".format(*lis) for lis in v  ])  )

<强>输出:

defaultdict(<type 'list'>,
{1: [[2, [561, 1, 20, 28]], [5, [553, 197, 20, 20]], [6, [113, 393, 36, 36]]],
 2: [[2, [557, 1, 24, 32]], [5, [552, 197, 23, 21]], [6, [113, 391, 39, 39]]]})
Expected output:
1 2 [561, 1, 20, 28] 5 [553, 197, 20, 20] 6 [113, 393, 36, 36] 
2 2 [557, 1, 24, 32] 5 [552, 197, 23, 21] 6 [113, 391, 39, 39] 

答案 1 :(得分:2)

这样做:

from collections import defaultdict

with open('1.txt') as fin:
    frame_rects = defaultdict(list)
    for row in (map(int, line.split()) for line in fin):
        id, frame, rect = row[0], row[2], row[3:7]
        frame_rects[frame].append((id, rect))
        # print '{} {} {}'.format(frame, id, rect) # (if you want to sample)

for key, value in frame_rects.items():
    print key, ' '.join([' '.join([str(i) for i in v]) for v in value])

输出:

1 2 [561, 1, 20, 28] 5 [553, 197, 20, 20] 6 [113, 393, 36, 36]
2 2 [557, 1, 24, 32] 5 [552, 197, 23, 21] 6 [113, 391, 39, 39]