Question

我有一个如下所示的数据文件：

 TOPIC:topic_0 2056
 ab  2.0
 cd  5.0
 ef  3.0
 gh  10.0

 TOPIC:topic_1 1000
 aa  3.0
 bd  5.0
 gh  2.0

依旧......直到TOPIC：topic_2000。第一行是主题和它的重量。也就是说，我在那个特定主题和他们各自的权重中都有这些词语。

现在，我想总结每个主题的第二列，并检查它给出的值。也就是说，我希望输出为：

 Topic:topic_0  20
 Topic:topic_1  10

即，主题编号和列值的总和（在主题1中，单词的权重是2,5,3,10）。我尝试使用：

with open('Input.txt') as in_file:
    for line in in_file:
        columns = line.split(' ')
        value = columns[0]

        if value[:6] == 'TOPIC:':
            total_value = columns[1]
            total_value = total_value[:-1]
            total_values = float(total_value)
            #print '\n'
            print columns[0]

但是，我不知道该怎么做。这只是打印主题编号。请帮忙！

Answer 1

试试这个：兼容Python 2.7和3.5

import re;

total = 0
temp = ''
topic = {}
p = re.compile('[a-z]*')

with open('Input.txt') as in_file:
    for line in in_file:
        line = line.strip()
        if not line: continue

        if line.startswith('TOPIC:'):
            temp = (line.split(' ')[0]).replace('TOPIC:', '')
            topic[temp] = 0;
        else:
            value = p.sub('', line).strip()
            topic[temp] = float(topic[temp]) + float(value)

for key in topic:
    print ("Topic:%s %s" % (key, topic[key]))

结果：

$ /c/Python27/python.exe input.py
Topic:topic_1 10.0
Topic:topic_0 20.0

在文本文件中汇总列

2 个答案: