在文本文件中汇总列

时间:2015-11-06 17:26:19

标签: python linux python-2.7

我有一个如下所示的数据文件:

 TOPIC:topic_0 2056
 ab  2.0
 cd  5.0
 ef  3.0
 gh  10.0

 TOPIC:topic_1 1000
 aa  3.0
 bd  5.0
 gh  2.0

依旧......直到TOPIC:topic_2000。第一行是主题和它的重量。也就是说,我在那个特定主题和他们各自的权重中都有这些词语。

现在,我想总结每个主题的第二列,并检查它给出的值。也就是说,我希望输出为:

 Topic:topic_0  20
 Topic:topic_1  10

即,主题编号和列值的总和(在主题1中,单词的权重是2,5,3,10)。我尝试使用:

with open('Input.txt') as in_file:
    for line in in_file:
        columns = line.split(' ')
        value = columns[0]

        if value[:6] == 'TOPIC:':
            total_value = columns[1]
            total_value = total_value[:-1]
            total_values = float(total_value)
            #print '\n'
            print columns[0]

但是,我不知道该怎么做。这只是打印主题编号。请帮忙!

2 个答案:

答案 0 :(得分:1)

Developer

答案 1 :(得分:1)

试试这个:兼容Python 2.7和3.5

import re;

total = 0
temp = ''
topic = {}
p = re.compile('[a-z]*')

with open('Input.txt') as in_file:
    for line in in_file:
        line = line.strip()
        if not line: continue

        if line.startswith('TOPIC:'):
            temp = (line.split(' ')[0]).replace('TOPIC:', '')
            topic[temp] = 0;
        else:
            value = p.sub('', line).strip()
            topic[temp] = float(topic[temp]) + float(value)

for key in topic:
    print ("Topic:%s %s" % (key, topic[key]))

结果:

$ /c/Python27/python.exe input.py
Topic:topic_1 10.0
Topic:topic_0 20.0