Python groupby问题

时间:2014-05-14 06:04:25

标签: python numpy pandas

示例原始数据:

 DATE,DUR,TYPE
2013-10-11,15,PTG
2013-10-11,110,PV1
2013-10-11,122,RG1
2013-10-11,31,SG2

我正在使用python,我有一个列表(下面是一个示例)

list=[['10/15/2013', 'PTG', '19'],
     ['10/15/2013', 'PV1', '219'],
     ['10/15/2013', 'PVG', '13'],
     ['10/15/2013', 'RG1', '112'],
     ['10/15/2013', 'SG2', '438'],
     ['10/12/2013', 'PV1', '110'],
     ['10/12/2013', 'PVG', '9'],
     ['10/12/2013', 'RG1', '25'],
     ['10/12/2013', 'SG2', '48']]

我希望列表(汇总结果)如下所示:

         #Date      PV1 PVG RG1 SG2
result=[[10/15/2013,219,13,112,438],
        [10/12/2013,110,9,25,48]]

以下是我的代码:

from itertools import groupby
datetime1=range(10/11/2013,10/15/2013)
chunks=[]
for datetime in datetime1:
    count=[datetime]
    path='/user_home/w_andalib_dvpy/sample_data/3d_sample.csv'
    file=open(path)
    data=csv.reader(file)
    table=[row for row in data]
    for key,group in groupby(table,lambda x: x[2]):
        total=0
        for item in group:
            total +=int(item[1])
        if   item[2]=='PV1':
             count[1]=total
        elif item[2]=='PVG':
             count[2]=total
        elif item[2]=='RG1':
             count[3]=total
        elif item[2]=='SG2':
       print count
    chunks.append(count)

但我没有得到任何结果。

2 个答案:

答案 0 :(得分:1)

使用这样的字典并提取其值列表:

list=[['10/15/2013', 'PTG', '19'],
     ['10/15/2013', 'PV1', '219'],
     ['10/15/2013', 'PVG', '13'],
     ['10/15/2013', 'RG1', '112'],
     ['10/15/2013', 'SG2', '438'],
     ['10/12/2013', 'PV1', '110'],
     ['10/12/2013', 'PVG', '9'],
     ['10/12/2013', 'RG1', '25'],
     ['10/12/2013', 'SG2', '48']]

my_dict = {'10/15/2013': ['10/15/2013'],  '10/12/2013': ['10/12/2013']}

for elem in list:
    my_dict[elem[0]].append(elem[2])

print my_dict.values()

答案 1 :(得分:0)

一种方法是将行收集到字典中,然后将该数据转换为新的列表列表。我不认为groupby的额外机制在这里有所帮助。

d={}  # initial a dictionary
for l in list:
    a = d.get(l[0],{}) # load each item into the dictionary
    a[l[1]]= int(l[2])
    d[l[0]]=a
result = [[k,v['PV1'],v['PVG'],v['RG1'],v['SG2']] for k,v in d.items()]

字典看起来像:

{'10/12/2013': {'PV1': 110, 'PVG': 9, 'RG1': 25, 'SG2': 48},
 '10/15/2013': {'PTG': 19, 'PV1': 219, 'PVG': 13, 'RG1': 112, 'SG2': 438}}

result看起来像:

[['10/12/2013', 110, 9, 25, 48], 
 ['10/15/2013', 219, 13, 112, 438]]

别忘了' 10/12/2013'是一个字符串,而不是一个数字。在将其转换为date之前,您不能像对待数字一样对待。