我有多个这样的CSV文件
" A.csv"
day,pid,wscore,lscore,wk,wd,lk,ld
1,"A",1,0,1,0,0,1
4,"A",2,0,1,0,1,1
5,"A",1,1,2,0,0,2
6,"A",1,0,2,1,1,2
...
" B.csv"
day,pid,wscore,lscore,wk,wd,lk,ld
1,"B",1,0,2,1,1,2
2,"B",2,2,2,1,0,1
4,"B",2,2,2,2,2,1
5,"B",2,1,1,1,1,1
...
" C.csv"
day,pid,wscore,lscore,wk,wd,lk,ld
2,"C",2,1,2,1,2,2
3,"C",1,2,2,1,2,2
5,"C",2,2,2,1,1,1
...
我希望"滑入"天和pid在一起,然后是wscore,lscore,wk,wd,lk和ld的平均值
示例output.csv
day,pid,wscore,lscore,wk,wd,lk,ld
1,"A,B",1,0,1.5,0.5,0.5,1.5
2,"B,C",2,1.5,2, ...
3,"C",1,2,2, ...
4,"A,B",2,1, ...
5,"A,B,C",2.5, ...
...
我怎样才能做到这一点? 谢谢。
答案 0 :(得分:0)
df=pd.concat([A,B,C],axis=0)#A=pd.read_csv('A.csv')
如果您在pid
pd.concat([df.groupby(['day'])['pid'].sum(),df.groupby(['day']).mean()],axis=1)
Out[297]:
pid wscore lscore wk wd lk ld
day
1 AB 1.000000 0.000000 1.500000 0.500000 0.500000 1.500000
2 BC 2.000000 1.500000 2.000000 1.000000 1.000000 1.500000
3 C 1.000000 2.000000 2.000000 1.000000 2.000000 2.000000
4 AB 2.000000 1.000000 1.500000 1.000000 1.500000 1.000000
5 ABC 1.666667 1.333333 1.666667 0.666667 0.666667 1.333333
6 A 1.000000 0.000000 2.000000 1.000000 1.000000 2.000000
如果你需要','
pd.concat([df.groupby(['day'])['pid'].apply(lambda x: "%s" % ','.join(x)),df.groupby(['day']).mean()],axis=1)
Out[300]:
pid wscore lscore wk wd lk ld
day
1 A,B 1.000000 0.000000 1.500000 0.500000 0.500000 1.500000
2 B,C 2.000000 1.500000 2.000000 1.000000 1.000000 1.500000
3 C 1.000000 2.000000 2.000000 1.000000 2.000000 2.000000
4 A,B 2.000000 1.000000 1.500000 1.000000 1.500000 1.000000
5 A,B,C 1.666667 1.333333 1.666667 0.666667 0.666667 1.333333
6 A 1.000000 0.000000 2.000000 1.000000 1.000000 2.000000