Python - 从文件中总结/排序数字范围

时间:2014-06-24 23:05:58

标签: python sorting python-2.7 csv formatting

这是我在这里的第一篇文章,很抱歉,如果我做错了什么,我会尽力解释。我有两个文件,一个是以下列格式命名为 text1.txt 的csv / txt文件:

"13:02",10
"13:03",30
"13:04",15
"13:05",12
"13:06",3

...以及另一个名为 console1.txt 的(纯文本)文件,其中包含以下内容:

Rate limit: 5 at Thu Jun 12 13:02:00 PDT 2014 (Total missed: 5)
Rate limit: 10 at Thu Jun 12 13:02:01 PDT 2014 (Total missed: 15)
Rate limit: 17 at Thu Jun 12 13:02:06 PDT 2014 (Total missed: 32)
Rate limit: 10 at Thu Jun 12 13:05:50 PDT 2014 (Total missed: 42)
Rate limit: 14 at Thu Jun 12 13:05:53 PDT 2014 (Total missed: 56)
Rate limit: 84 at Thu Jun 12 13:05:21 PDT 2014 (Total missed: 140)
Rate limit: 2 at Thu Jun 12 13:06:30 PDT 2014 (Total missed: 142)
Rate limit: 5 at Thu Jun 12 13:06:34 PDT 2014 (Total missed: 147)

我想总结一下这些数字来得到总数"费率有限"每分钟,然后将这些添加到第一个csv / txt文件中的相应行。因此,预期结果将如下所示:

"13:02",42
"13:03",30
"13:04",15
"13:05",120
"13:06",10

时间戳以 13:02 开头的行上的数字(所以,5 + 10 + 17 =总共32)得到总结并添加到" 13:02"列(32 +原始10 = 42),以 13:05 开头,被添加到" 13:05"列,等等。

我不确定如何处理处理数据 - 即总结每分钟的数字。弄清楚如何从console.txt获取数据,如

"13:02",32
"13:05",108
"13:06",7

会有所帮助,从那里我可以弄清楚如何将它们添加到相应的csv行。

谢谢!


编辑:

通过这个过程思考,这是我的步骤(用大括号中的伪代码):

我们说这是 console.txt

Rate limit: 5 at Thu Jun 12 13:02:00 PDT 2014 (Total missed: 5)
Rate limit: 10 at Thu Jun 12 13:02:01 PDT 2014 (Total missed: 15)
Rate limit: 5 at Thu Jun 12 13:06:34 PDT 2014 (Total missed: 20)

1)阅读&切断所有不必要的数据

temp = open("console.txt").read()
temp = temp2.replace("Rate limit: ", "")
temp = temp2.replace(" at Thu Jun 12 ", ",")
{{ Remove the text between "PDT 2014 (" and ")" including both of those string, i.e. cut off everything after the seconds marker starting at "PDT" – this I can do myself }}
{{ Cut off the seconds of each minute – *stuck here* }}

2)格式化

{{ Add quotes around the times and reverse the two columns – can figure this out }}

这会让我:

"13:02",5
"13:02",10
"13:06",5

3)保存到新文件

file = open("file.txt", 'w')
file.write(temp)
file.close()

我可以想出从这一点开始将数字添加到类似的csv文件中。

1 个答案:

答案 0 :(得分:1)

简单示例(不读取和写入文件):

csv = '''"13:02",10
"13:03",30
"13:04",15
"13:05",12
"13:06",3'''

rates = '''Rate limit: 5 at Thu Jun 12 13:02:00 PDT 2014 (Total missed: 5)
Rate limit: 10 at Thu Jun 12 13:02:01 PDT 2014 (Total missed: 15)
Rate limit: 17 at Thu Jun 12 13:02:06 PDT 2014 (Total missed: 32)
Rate limit: 10 at Thu Jun 12 13:05:50 PDT 2014 (Total missed: 42)
Rate limit: 14 at Thu Jun 12 13:05:53 PDT 2014 (Total missed: 56)
Rate limit: 84 at Thu Jun 12 13:05:21 PDT 2014 (Total missed: 140)
Rate limit: 2 at Thu Jun 12 13:06:30 PDT 2014 (Total missed: 142)
Rate limit: 5 at Thu Jun 12 13:06:34 PDT 2014 (Total missed: 147)'''

# --- example code ---

import re

all_times =  {}

# change csv into dict

for x in csv.splitlines():
    time, value  = x.split(',')
    all_times[time] = int(value)

# print dict

print '--- old ---' 
for k,v in all_times.items():
    print k, v

# add rates to dict

for x in rates.splitlines():
    value, time = re.findall('Rate limit: (\d+) .* (\d+:\d+):', x)[0]
    all_times['"%s"' % time] += int(value)

# print dict

print '--- new ---' 
for k,v in all_times.items():
    print k, v

结果:

--- old ---
"13:04" 15
"13:05" 12
"13:02" 10
"13:03" 30
"13:06" 3
--- new ---
"13:04" 15
"13:05" 120
"13:02" 42
"13:03" 30
"13:06" 10