我的数据框如下:
index col1 col2 col3 col4 col5
0 0 Week_1 James John 1 when and why?
1 1 Week_1 James John 3 when and why? How?
2 2 Week_2 James John 2 How far is it? Are you going?
3 3 Week_2 Mark Jim 3 Do you know when?
4 4 Week_2 Andrew Simon 3 What time?
5 5 Week_2 Andrew Simon 6 What time?
如何将col2
和col3
分组,然后计算col2
和col3
的均值和计数?
df.groupby(['col2','col3'], as_index=False).agg({'col4':'mean'}).reset_index()
输出:
index col2 col3 col4
0 0 Andrew Simon 4.5
1 1 James John 2.0
2 2 Mark Jim 3.0
df.groupby(['col2','col3']).size().reset_index()
输出:
col2 col3 0
0 Andrew Simon 2
1 James John 3
2 Mark Jim 1
我怎么能得到这样的结果?谢谢。
index col2 col3 mean count
0 0 James John 2.0 3
1 3 Mark Jim 3.0 1
2 4 Andrew Simon 4.5 2
答案 0 :(得分:3)
您可以使用groupby和agg(可能需要使用0.25+的熊猫)。
(
df.groupby(['col2','col3'])
.agg(index=('index', 'first'),
mean=('col4', 'mean'),
count=('col4', 'size'))
.reset_index()
.sort_values(by='index')
)
col2 col3 index mean count
1 James John 0 2.0 3
2 Mark Jim 3 3.0 1
0 Andrew Simon 4 4.5 2