我有一个如下所示的数据文件:
TOPIC:topic_0 2056
ab 2.0
cd 5.0
ef 3.0
gh 10.0
TOPIC:topic_1 1000
aa 3.0
bd 5.0
gh 2.0
依旧......直到TOPIC:topic_2000。第一行是主题和它的重量。也就是说,我在那个特定主题和他们各自的权重中都有这些词语。
现在,我想总结每个主题的第二列,并检查它给出的值。也就是说,我希望输出为:
Topic:topic_0 20
Topic:topic_1 10
即,主题编号和列值的总和(在主题1中,单词的权重是2,5,3,10)。我尝试使用:
with open('Input.txt') as in_file:
for line in in_file:
columns = line.split(' ')
value = columns[0]
if value[:6] == 'TOPIC:':
total_value = columns[1]
total_value = total_value[:-1]
total_values = float(total_value)
#print '\n'
print columns[0]
但是,我不知道该怎么做。这只是打印主题编号。请帮忙!
答案 0 :(得分:1)
Developer
答案 1 :(得分:1)
试试这个:兼容Python 2.7和3.5
import re;
total = 0
temp = ''
topic = {}
p = re.compile('[a-z]*')
with open('Input.txt') as in_file:
for line in in_file:
line = line.strip()
if not line: continue
if line.startswith('TOPIC:'):
temp = (line.split(' ')[0]).replace('TOPIC:', '')
topic[temp] = 0;
else:
value = p.sub('', line).strip()
topic[temp] = float(topic[temp]) + float(value)
for key in topic:
print ("Topic:%s %s" % (key, topic[key]))
结果:
$ /c/Python27/python.exe input.py
Topic:topic_1 10.0
Topic:topic_0 20.0