我刚刚进入Pandas并尝试为汽车生成电子表格。我喜欢熊猫,但它的进展缓慢,我正在尝试生成一些总和的新列......
import pandas as pd
data = pd.DataFrame({"Car":["Hyundai","Hyundai","Honda", "Honda"], "Type":["Accent", "Accent", "Civic", "Civic"], "Trans":["Auto", "Manual", "Auto", "Manual"], "TOTAL":[2,4,5,3]})
print data
print data.groupby(['Car', 'Type', 'Trans'])['TOTAL'].sum()
我得到了完全可预测的......
Car TOTAL Trans Type
0 Hyundai 2 Auto Accent
1 Hyundai 4 Manual Accent
2 Honda 5 Auto Civic
3 Honda 3 Manual Civic
Car Type Trans
Honda Civic Auto 5
Manual 3
Hyundai Accent Auto 2
Manual 4
理想情况下,我最喜欢的是......
Car Type Auto Manual Total
Honda Civic 5 3 8
Hyundai Accent 2 4 6
我的知识并不是很好的Pandas(还),但我猜它是“应用”或agg()函数但到目前为止,从语法上来说,我正在从语法错误中敲打我的脑袋,但是我很欣赏正确方向的任何指示。 .. JW
答案 0 :(得分:3)
要使用内置的pandas
方法,您可以:将'Car', 'Type', 'Trans'
columns
设置为索引,将unstack()
设置为获得每个子组的Total
,然后只是总结columns
:
data = pd.DataFrame({"Car":["Hyundai","Hyundai","Honda", "Honda"], "Type":["Accent", "Accent", "Civic", "Civic"], "Trans":["Auto", "Manual", "Auto", "Manual"], "TOTAL":[2,4,5,3]}).set_index(['Car', 'Type', 'Trans'])
total_by_trans = data.unstack().loc[:, 'TOTAL'] # to get rid of the column MultiIndex created by unstack()
total_by_trans['Total'] = total_by_trans.sum(axis=1)
total_by_trans.columns.name = None # just cleaning up
Auto Manual Total
Car Type
Honda Civic 5 3 8
Hyundai Accent 2 4 6
答案 1 :(得分:1)
您可以在数据框中提前准备两个新系列,包括自动和手动计数。
data['total_manual'] = data['TOTAL'] * (data['Trans'] == 'Manual').astype(int)
data['total_auto'] = data['TOTAL'] * (data['Trans'] == 'Auto').astype(int)
print data.groupby(['Car', 'Type'])['total_auto', 'total_manual', 'TOTAL'].sum()
类似的方法是使用带边距的数据透视表。
pvt = pd.pivot_table(data, index=['Car', 'Type'], columns='Trans', values='TOTAL', margins='columns', aggfunc=np.sum)
pvt = pvt.drop(('All',''), axis=0)