我在熊猫中有一组数据已按两个因素分组,所以我可以分别对这些组进行求和。换句话说:
grouped = df.groupby(['A','B'])['C'].sum()
现在,我想将std
这个总和“跨越”B
,以便我可以看到这种偏差如何针对不同的A
值进行更改。如何跨分组数据的“维度”或“索引”执行此聚合操作?
我是熊猫新手,所以这可能很容易......但感谢您的帮助!
答案 0 :(得分:0)
您似乎需要groupby
参数level
:
grouped = df.groupby(['A','B'])['C'].sum().groupby(level='B').std()
样品:
np.random.seed(100)
df = pd.DataFrame(np.random.randint(5, size=(10,3)), columns=list('ABC'))
print (df)
A B C
0 0 0 3
1 0 2 4
2 2 2 2
3 2 1 0
4 0 4 3
5 4 2 0
6 3 1 2
7 3 4 4
8 1 3 4
9 4 3 3
grouped = df.groupby(['A','B'])['C'].sum().groupby(level='B').std().reset_index()
print (grouped)
B C
0 0 NaN
1 1 1.414214
2 2 2.000000
3 3 0.707107
4 4 0.707107
grouped = df.groupby(['A','B'])['C'].sum().groupby(level=1).std().reset_index()
print (grouped)
B C
0 0 NaN
1 1 1.414214
2 2 2.000000
3 3 0.707107
4 4 0.707107
解释,各方:
#groupby by columns A, B, aggregate column C
#->output is Series with MultiIndex
grouped1 = df.groupby(['A','B'])['C'].sum()
print (grouped1)
A B
0 0 3
2 4
4 3
1 3 4
2 1 0
2 2
3 1 2
4 4
4 2 0
3 3
Name: C, dtype: int32
print (type(grouped1))
<class 'pandas.core.series.Series'>
print (grouped1.index)
MultiIndex(levels=[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4]],
labels=[[0, 0, 0, 1, 2, 2, 3, 3, 4, 4], [0, 2, 4, 3, 1, 2, 1, 4, 2, 3]],
names=['A', 'B'])
#groupby by level B of MultiIndex
#->output is Series with MultiIndex, so reset_index for df
grouped = grouped1.groupby(level='B').std().reset_index()
print (grouped)
B C
0 0 NaN
1 1 1.414214
2 2 2.000000
3 3 0.707107
4 4 0.707107
#all together
grouped = df.groupby(['A','B'])['C'].sum().groupby(level='B').std().reset_index()
print (grouped)
B C
0 0 NaN
1 1 1.414214
2 2 2.000000
3 3 0.707107
4 4 0.707107