假设我有一个基于“Dic of Dicts'Group'list'列表(以下)格式的Pandas Dataframe ......
ITEMS={
“Item_group1”:{‘Stuff’:’Some stuf’
‘More Stuff’:’Extra Stuff’
Group:[[Iteration1, 18, 25,0], [Iteration1, 43, 67,1], [Iteration1, 87, 76,1],
[Iteration2, 45, 29,0], [Iteration2, 44, 77,1], [Iteration2, 43, 74,0]],
}
“Item_group2”:{‘Stuff’:’Some stuf’
‘More Stuff’:’Extra Stuff’
Group:[[Iteration1, 75, 564,0], [Iteration1, 21, 87,1], [Iteration1, 7, 5,1],
[Iteration2, 54, 24,0], [Iteration2, 7, 45,1], [Iteration2, 45, 745,0]],
}
DataFrame,格式如下......
Iteration Value1 Value2 Feature Active
Iteration1 18 25 0
Iteration1 3 67 1
Iteration1 87 76 1
Iteration2 45 29 0
Iteration2 44 7 1
Iteration2 43 74 0
如何基于'Feature Active'== 1分离和计算每次迭代的平均值,并忽略任何'Feature Active'== 0条目?
我将以下代码计算为“Iteration”和“Feature Active”作为键分隔后,Value1和Value2的每次迭代的统计数据,但它显示“Feature Active”== 0,我并不关心。
FeatureAvgs = Item_group1_DF.groupby(['Iteration’,’Feature Active'])
print np.round(FeatureAvgs[['Value1','Value2']].describe(), decimals=1)
产生以下输出......(忽略实际数字,这是从另一个数据帧中获取的)
Iteration Feature Enabled
Iteration1 0 count 3672.0 3672.0
mean -1352.5 0.0
std 220.5 0.0
min -1920.0 0.0
25% -1507.2 0.0
50% -1267.0 0.0
75% -1184.0 0.0
max -785.0 0.0
1 count 580.0 580.0
mean -1368.6 -1394.5
std 151.5 157.7
min -1788.0 -1805.0
25% -1454.2 -1490.2
50% -1335.5 -1361.0
75% -1270.0 -1291.0
max -1045.0 -1033.0
Iteration2 0 count 20612.0 20612.0
mean -1073.5 0.0
std 142.3 0.0
min -1730.0 0.0
25% -1088.0 0.0
50% -1036.0 0.0
75% -1005.0 0.0
max -805.0 0.0
1 count 14718.0 14718.0
mean -1113.6 -1161.1
std 129.3 134.9
min -1773.0 -1818.0
25% -1151.0 -1214.0
50% -1095.0 -1122.0
75% -1043.0 -1075.0
max -832.0 -897.0
但我只是在功能激活时的平均值(== 1)之后。很抱歉这个问题很长,但我是Pandas的新手,还在阅读文档
答案 0 :(得分:1)
您可以先过滤初始df,而不是对groupby对象进行过滤:
FeatureAvgs = Item_group1_DF[item_group1_DF['Feature Enabled'] == 1].groupby(['Iteration’,’Feature Active'])[['Value1','Value2']].mean()
如果您只希望describe
只使用mean
,则无需使用mean
,您可以从结果中访问mean
列。 describe
使用:
print np.round(FeatureAvgs[['Value1','Value2']].describe()['mean'], decimals=1)
答案 1 :(得分:0)
如果我理解得很好,你可以这样做:
> df.groupby(["Feature Active", "Iteration"]).mean().loc[1]
Value1 Value2
Iteration
Iteration1 45 71.5
Iteration2 44 7.0
首先是groupby
功能,第二个是Iteration变量。在每个组中,您应用mean()
函数,然后获得索引为1
的组,该组对应于Feature Active == 1
组。
使用:
> df
Iteration Value1 Value2 Feature Active
0 Iteration1 18 25 0
1 Iteration1 3 67 1
2 Iteration1 87 76 1
3 Iteration2 45 29 0
4 Iteration2 44 7 1
5 Iteration2 43 74 0
> df.groupby(["Feature Active", "Iteration"]).mean()
Value1 Value2
Feature Active Iteration
0 Iteration1 18 25.0
Iteration2 44 51.5
1 Iteration1 45 71.5
Iteration2 44 7.0
告诉我它是不是你想要的。
HTH