熊猫冷凝分组

时间:2017-10-27 20:30:29

标签: python pandas dataframe pandas-groupby

我目前正在尝试使用Pandas中的groupby函数来聚合一些CSV数据。

这是我目前在CSV中的一小部分数据:

Company,School,Number,Type
Adtelem Global Education Inc.,Carrington,3,For-Profit
Adtelem Global Education Inc.,Carrington,4,For-Profit
Adtelem Global Education Inc.,Carrington,1,For-Profit
Adtelem Global Education Inc.,Carrington,4,For-Profit
Adtelem Global Education Inc.,Carrington,3,For-Profit
Adtelem Global Education Inc.,Carrington,3,For-Profit
Adtelem Global Education Inc.,DeVry Institute of Technology,4,For-Profit
Adtelem Global Education Inc.,DeVry Institute of Technology,4,For-Profit
Adtelem Global Education Inc.,DeVry Institute of Learning,16,   For-Profit
Adtelem Global Education Inc.,DeVry Institute of Learning,9,    
Career Education Corporation,Le Cordon Blue College of Culinary Arts,6,For-Profit
Career Education Corporation,Le Cordon Blue College of Culinary Arts,23,For-Profit

目前看来,同一个“学校”栏目(Carrington,Devry等)有很多重复,我想将它们浓缩下来。更具体地说,我希望每个独特的学校都有1行,该学校也为该学校的所有实例编号,但保留拥有该学校的公司名称(第一列)和学校类型(最后一个)柱)。

最终产品看起来像这样:

Company,School,Number,Type
Adtelem Global Education Inc.,Carrington,18,For-Profit,
Adtelem Global Education Inc., DeVry Institute of Technology,8,For-Profit
Adtelem Global Education Inc.,DeVry Institute of Learning,25,For-Profit
Career Education Corporation,Le Cordon Blue College of Culinary Arts,29,For-Profit

我使用了以下代码:

data2 = data.groupby("School").sum()

然而,当我这样做时,我也失去了每所学校附属的公司和类型。我知道解决方案是相当基本的,但我是Pandas的新手,所以你们所能给予的任何帮助都将不胜感激!

2 个答案:

答案 0 :(得分:1)

我会使用groupby + agg

执行此操作
df.groupby('School', as_index=False)\
    .agg({'Company' : 'first', 'Type' : 'first', 'Number' : 'sum'})

                                    School                        Company  \
0                               Carrington  Adtelem Global Education Inc.   
1              DeVry Institute of Learning  Adtelem Global Education Inc.   
2            DeVry Institute of Technology  Adtelem Global Education Inc.   
3  Le Cordon Blue College of Culinary Arts   Career Education Corporation   

   Number        Type  
0      18  For-Profit  
1      25  For-Profit  
2       8  For-Profit  
3      29  For-Profit 

我认为最好明确汇总所有列。

答案 1 :(得分:0)

您可以提供按

分组的列列表
data2 = data.groupby(["School", "Company", "Type"]).sum()