Question

我在一个名为

的文本文件中有以下数据表示

data.txt中

03/05/2016 16:43  502
03/05/2016 16:43  502
03/05/2016 16:44  501
03/05/2016 16:44  504
03/05/2016 16:44  505
03/05/2016 16:44  506
04/05/2016 16:44  501
04/05/2016 16:45  501
04/05/2016 16:45  501
04/05/2016 16:45  52
04/05/2016 17:08  50
05/05/2016 17:08  502
05/05/2016 17:08  503
05/05/2016 17:08  504
05/05/2016 17:09  506
06/05/2016 17:09  507
06/05/2016 17:09  507
07/05/2016 17:09  508
07/05/2016 17:09  50
08/05/2016 17:10  5
08/05/2016 17:10  504
09/05/2016 17:10  504
09/05/2016 17:10  503
09/05/2016 17:10  503
10/05/2016 17:11  505
10/05/2016 17:11  505

我想执行某些数学运算，以便我可以获得最终结果

03/05/2016   3020
04/05/2016   1605
05/05/2016   2015
06/05/2016   5023
07/05/2016   1014
08/05/2016   558
09/05/2016   5023
10/05/2016   5022

第二列是值的总和

此结果存储在另一个文本文件中，例如data1.txt

我想在python 2.7中编写这段代码

我怎样才能实现这一目标....

Answer 1

您可以使用Counter对给定日期的值求和：

from collections import Counter

with open('data.txt') as f:
    res = sum((Counter({d: int(c)}) for d, t, c in (line.split() for line in f)), Counter())

with open('data1.txt', 'wb') as f:
    f.writelines('{0}\t{1}\n'.format(*x) for x in sorted(res.items()))

输出：

03/05/2016  3020
04/05/2016  1605
05/05/2016  2015
06/05/2016  1014
07/05/2016  558
08/05/2016  509
09/05/2016  1510
10/05/2016  1010

此解决方案不需要标准Python安装之外的任何库。

Answer 2

一个纯粹的python解决方案：

import collections

data=collections.defaultdict(int)
with open('data.txt', 'r') as f:
    for line in f:
        row=line.split()
        data[row[0]]+=int(row[2])

with open('data1.txt', 'w') as f:
    for key, value in sorted(data.items()):
        f.write(str(key)+" "+str(value)+"\n")

输出：

$ python a.py 
$ cat data1.txt 
03/05/2016 3020
04/05/2016 1605
05/05/2016 2015
06/05/2016 1014
07/05/2016 558
08/05/2016 509
09/05/2016 1510
10/05/2016 1010
$

Answer 3

您可以使用以下内容：

from collections import OrderedDict
f = open('data.txt')
res = OrderedDict()
for line in f:

    values = line.split(' ')
    if len(values) == 4:
        date = values[0]
        val = values[3]
        if res.get(date):
            res[date] += int(val)
        else:
            res[date] = int(val)

f.close()

f = open('data1.txt', 'w')
for line in res.keys():
    f.write('{} {}\n'.format(line, res[line]))
f.close()

Answer 4

设置

import pandas as pd
from StringIO import StringIO

text = """03/05/2016 16:43  502
03/05/2016 16:43  502
03/05/2016 16:44  501
03/05/2016 16:44  504
03/05/2016 16:44  505
03/05/2016 16:44  506
04/05/2016 16:44  501
04/05/2016 16:45  501
04/05/2016 16:45  501
04/05/2016 16:45  52
04/05/2016 17:08  50
05/05/2016 17:08  502
05/05/2016 17:08  503
05/05/2016 17:08  504
05/05/2016 17:09  506
06/05/2016 17:09  507
06/05/2016 17:09  507
07/05/2016 17:09  508
07/05/2016 17:09  50
08/05/2016 17:10  5
08/05/2016 17:10  504
09/05/2016 17:10  504
09/05/2016 17:10  503
09/05/2016 17:10  503
10/05/2016 17:11  505
10/05/2016 17:11  505"""

df = pd.read_csv(StringIO(text), delim_whitespace=True,
                 parse_dates=[0], names=['date', 'time', 'value'])

看起来像

         date   time  value
0  2016-03-05  16:43    502
1  2016-03-05  16:43    502
2  2016-03-05  16:44    501
3  2016-03-05  16:44    504
4  2016-03-05  16:44    505
5  2016-03-05  16:44    506
6  2016-04-05  16:44    501
7  2016-04-05  16:45    501
8  2016-04-05  16:45    501
9  2016-04-05  16:45     52
10 2016-04-05  17:08     50
11 2016-05-05  17:08    502
12 2016-05-05  17:08    503
13 2016-05-05  17:08    504
14 2016-05-05  17:09    506
15 2016-06-05  17:09    507
16 2016-06-05  17:09    507
17 2016-07-05  17:09    508
18 2016-07-05  17:09     50
19 2016-08-05  17:10      5
20 2016-08-05  17:10    504
21 2016-09-05  17:10    504
22 2016-09-05  17:10    503
23 2016-09-05  17:10    503
24 2016-10-05  17:11    505
25 2016-10-05  17:11    505

解决方案

df.groupby('date').sum()

使用python2.7

4 个答案:

设置

看起来像

解决方案