使用pandas从多个csv获取数据的一个字段(列)的总和

时间:2017-06-20 17:39:17

标签: python csv

我有多个这样的CSV文件

" A.csv"

day,pid,wscore,lscore,wk,wd,lk,ld
1,"A",1,0,1,0,0,1
4,"A",2,0,1,0,1,1
5,"A",1,1,2,0,0,2
6,"A",1,0,2,1,1,2
...

" B.csv"

day,pid,wscore,lscore,wk,wd,lk,ld
1,"B",1,0,2,1,1,2
2,"B",2,2,2,1,0,1
4,"B",2,2,2,2,2,1
5,"B",2,1,1,1,1,1
...

" C.csv"

day,pid,wscore,lscore,wk,wd,lk,ld
2,"C",2,1,2,1,2,2
3,"C",1,2,2,1,2,2
5,"C",2,2,2,1,1,1
...

我希望"滑入"天和pid在一起,然后是wscore,lscore,wk,wd,lk和ld的平均值

示例output.csv

day,pid,wscore,lscore,wk,wd,lk,ld
1,"A,B",1,0,1.5,0.5,0.5,1.5
2,"B,C",2,1.5,2, ...
3,"C",1,2,2, ...
4,"A,B",2,1, ...
5,"A,B,C",2.5, ...
...

我怎样才能做到这一点? 谢谢。

1 个答案:

答案 0 :(得分:0)

df=pd.concat([A,B,C],axis=0)#A=pd.read_csv('A.csv')

如果您在pid

中不需要','
 pd.concat([df.groupby(['day'])['pid'].sum(),df.groupby(['day']).mean()],axis=1)
Out[297]: 
     pid    wscore    lscore        wk        wd        lk        ld
day                                                                 
1     AB  1.000000  0.000000  1.500000  0.500000  0.500000  1.500000
2     BC  2.000000  1.500000  2.000000  1.000000  1.000000  1.500000
3      C  1.000000  2.000000  2.000000  1.000000  2.000000  2.000000
4     AB  2.000000  1.000000  1.500000  1.000000  1.500000  1.000000
5    ABC  1.666667  1.333333  1.666667  0.666667  0.666667  1.333333
6      A  1.000000  0.000000  2.000000  1.000000  1.000000  2.000000

如果你需要','

pd.concat([df.groupby(['day'])['pid'].apply(lambda x: "%s" % ','.join(x)),df.groupby(['day']).mean()],axis=1)

Out[300]: 
       pid    wscore    lscore        wk        wd        lk        ld
day                                                                   
1      A,B  1.000000  0.000000  1.500000  0.500000  0.500000  1.500000
2      B,C  2.000000  1.500000  2.000000  1.000000  1.000000  1.500000
3        C  1.000000  2.000000  2.000000  1.000000  2.000000  2.000000
4      A,B  2.000000  1.000000  1.500000  1.000000  1.500000  1.000000
5    A,B,C  1.666667  1.333333  1.666667  0.666667  0.666667  1.333333
6        A  1.000000  0.000000  2.000000  1.000000  1.000000  2.000000