python数据帧的条件求和

时间:2015-12-28 17:41:12

标签: python pandas dataframe

我刚刚进入Pandas并尝试为汽车生成电子表格。我喜欢熊猫,但它的进展缓慢,我正在尝试生成一些总和的新列......

import pandas as pd

data = pd.DataFrame({"Car":["Hyundai","Hyundai","Honda", "Honda"], "Type":["Accent", "Accent", "Civic", "Civic"], "Trans":["Auto", "Manual", "Auto", "Manual"], "TOTAL":[2,4,5,3]})

print data

print data.groupby(['Car', 'Type', 'Trans'])['TOTAL'].sum()

我得到了完全可预测的......

       Car  TOTAL   Trans    Type
0  Hyundai      2    Auto  Accent
1  Hyundai      4  Manual  Accent
2    Honda      5    Auto   Civic
3    Honda      3  Manual   Civic

Car      Type    Trans 
Honda    Civic   Auto      5
                 Manual    3
Hyundai  Accent  Auto      2
                 Manual    4

理想情况下,我最喜欢的是......

Car       Type    Auto    Manual  Total
Honda     Civic     5        3      8
Hyundai   Accent    2        4      6

我的知识并不是很好的Pandas(还),但我猜它是“应用”或agg()函数但到目前为止,从语法上来说,我正在从语法错误中敲打我的脑袋,但是我很欣赏正确方向的任何指示。 .. JW

2 个答案:

答案 0 :(得分:3)

要使用内置的pandas方法,您可以:将'Car', 'Type', 'Trans' columns设置为索引,将unstack()设置为获得每个子组的Total ,然后只是总结columns

data = pd.DataFrame({"Car":["Hyundai","Hyundai","Honda", "Honda"], "Type":["Accent", "Accent", "Civic", "Civic"], "Trans":["Auto", "Manual", "Auto", "Manual"], "TOTAL":[2,4,5,3]}).set_index(['Car', 'Type', 'Trans'])

total_by_trans = data.unstack().loc[:, 'TOTAL']         # to get rid of the column MultiIndex created by unstack()
total_by_trans['Total'] = total_by_trans.sum(axis=1)    
total_by_trans.columns.name = None                      # just cleaning up

                Auto  Manual  Total
Car     Type                       
Honda   Civic      5       3      8
Hyundai Accent     2       4      6

答案 1 :(得分:1)

您可以在数据框中提前准备两个新系列,包括自动和手动计数。

data['total_manual'] = data['TOTAL'] * (data['Trans'] == 'Manual').astype(int)
data['total_auto'] = data['TOTAL'] * (data['Trans'] == 'Auto').astype(int)
print data.groupby(['Car', 'Type'])['total_auto', 'total_manual', 'TOTAL'].sum()

类似的方法是使用带边距的数据透视表。

pvt = pd.pivot_table(data, index=['Car', 'Type'], columns='Trans', values='TOTAL', margins='columns', aggfunc=np.sum)
pvt = pvt.drop(('All',''), axis=0)