我目前正在尝试使用Pandas中的groupby函数来聚合一些CSV数据。
这是我目前在CSV中的一小部分数据:
Company,School,Number,Type
Adtelem Global Education Inc.,Carrington,3,For-Profit
Adtelem Global Education Inc.,Carrington,4,For-Profit
Adtelem Global Education Inc.,Carrington,1,For-Profit
Adtelem Global Education Inc.,Carrington,4,For-Profit
Adtelem Global Education Inc.,Carrington,3,For-Profit
Adtelem Global Education Inc.,Carrington,3,For-Profit
Adtelem Global Education Inc.,DeVry Institute of Technology,4,For-Profit
Adtelem Global Education Inc.,DeVry Institute of Technology,4,For-Profit
Adtelem Global Education Inc.,DeVry Institute of Learning,16, For-Profit
Adtelem Global Education Inc.,DeVry Institute of Learning,9,
Career Education Corporation,Le Cordon Blue College of Culinary Arts,6,For-Profit
Career Education Corporation,Le Cordon Blue College of Culinary Arts,23,For-Profit
目前看来,同一个“学校”栏目(Carrington,Devry等)有很多重复,我想将它们浓缩下来。更具体地说,我希望每个独特的学校都有1行,该学校也为该学校的所有实例编号,但保留拥有该学校的公司名称(第一列)和学校类型(最后一个)柱)。
最终产品看起来像这样:
Company,School,Number,Type
Adtelem Global Education Inc.,Carrington,18,For-Profit,
Adtelem Global Education Inc., DeVry Institute of Technology,8,For-Profit
Adtelem Global Education Inc.,DeVry Institute of Learning,25,For-Profit
Career Education Corporation,Le Cordon Blue College of Culinary Arts,29,For-Profit
我使用了以下代码:
data2 = data.groupby("School").sum()
然而,当我这样做时,我也失去了每所学校附属的公司和类型。我知道解决方案是相当基本的,但我是Pandas的新手,所以你们所能给予的任何帮助都将不胜感激!
答案 0 :(得分:1)
我会使用groupby
+ agg
:
df.groupby('School', as_index=False)\
.agg({'Company' : 'first', 'Type' : 'first', 'Number' : 'sum'})
School Company \
0 Carrington Adtelem Global Education Inc.
1 DeVry Institute of Learning Adtelem Global Education Inc.
2 DeVry Institute of Technology Adtelem Global Education Inc.
3 Le Cordon Blue College of Culinary Arts Career Education Corporation
Number Type
0 18 For-Profit
1 25 For-Profit
2 8 For-Profit
3 29 For-Profit
我认为最好明确汇总所有列。
答案 1 :(得分:0)
您可以提供按
分组的列列表data2 = data.groupby(["School", "Company", "Type"]).sum()