Question

假设我有一个基于“Dic of Dicts'Group'list'列表（以下）格式的Pandas Dataframe ......

ITEMS={
    “Item_group1”:{‘Stuff’:’Some stuf’
            ‘More Stuff’:’Extra Stuff’
            Group:[[Iteration1, 18, 25,0], [Iteration1, 43, 67,1], [Iteration1, 87, 76,1],
                [Iteration2, 45, 29,0], [Iteration2, 44, 77,1], [Iteration2, 43, 74,0]],

            }
    “Item_group2”:{‘Stuff’:’Some stuf’
            ‘More Stuff’:’Extra Stuff’
            Group:[[Iteration1, 75, 564,0], [Iteration1, 21, 87,1], [Iteration1, 7, 5,1],
                [Iteration2, 54, 24,0], [Iteration2, 7, 45,1], [Iteration2, 45, 745,0]],
            }

DataFrame，格式如下......

Iteration   Value1  Value2  Feature Active
Iteration1  18      25      0
Iteration1  3       67      1
Iteration1  87      76      1
Iteration2  45      29      0
Iteration2  44      7       1
Iteration2  43      74      0

如何基于'Feature Active'== 1分离和计算每次迭代的平均值，并忽略任何'Feature Active'== 0条目？

我将以下代码计算为“Iteration”和“Feature Active”作为键分隔后，Value1和Value2的每次迭代的统计数据，但它显示“Feature Active”== 0，我并不关心。

FeatureAvgs = Item_group1_DF.groupby(['Iteration’,’Feature Active'])
print np.round(FeatureAvgs[['Value1','Value2']].describe(), decimals=1)

产生以下输出......（忽略实际数字，这是从另一个数据帧中获取的）

Iteration   Feature Enabled
Iteration1  0               count   3672.0   3672.0
                            mean   -1352.5      0.0
                            std      220.5      0.0
                            min    -1920.0      0.0
                            25%    -1507.2      0.0
                            50%    -1267.0      0.0
                            75%    -1184.0      0.0
                            max     -785.0      0.0
            1               count    580.0    580.0
                            mean   -1368.6  -1394.5
                            std      151.5    157.7
                            min    -1788.0  -1805.0
                            25%    -1454.2  -1490.2
                            50%    -1335.5  -1361.0
                            75%    -1270.0  -1291.0
                            max    -1045.0  -1033.0
Iteration2  0               count  20612.0  20612.0
                            mean   -1073.5      0.0
                            std      142.3      0.0
                            min    -1730.0      0.0
                            25%    -1088.0      0.0
                            50%    -1036.0      0.0
                            75%    -1005.0      0.0
                            max     -805.0      0.0
            1               count  14718.0  14718.0
                            mean   -1113.6  -1161.1
                            std      129.3    134.9
                            min    -1773.0  -1818.0
                            25%    -1151.0  -1214.0
                            50%    -1095.0  -1122.0
                            75%    -1043.0  -1075.0
                            max     -832.0   -897.0

但我只是在功能激活时的平均值（== 1）之后。很抱歉这个问题很长，但我是Pandas的新手，还在阅读文档

Answer 1

您可以先过滤初始df，而不是对groupby对象进行过滤：

FeatureAvgs = Item_group1_DF[item_group1_DF['Feature Enabled'] == 1].groupby(['Iteration’,’Feature Active'])[['Value1','Value2']].mean()

如果您只希望describe只使用mean，则无需使用mean，您可以从结果中访问mean列。 describe使用：

print np.round(FeatureAvgs[['Value1','Value2']].describe()['mean'], decimals=1)

Answer 2

如果我理解得很好，你可以这样做：

> df.groupby(["Feature Active", "Iteration"]).mean().loc[1]

            Value1  Value2
Iteration                 
Iteration1      45    71.5
Iteration2      44     7.0

首先是groupby功能，第二个是Iteration变量。在每个组中，您应用mean()函数，然后获得索引为1的组，该组对应于Feature Active == 1组。

使用：

> df

    Iteration  Value1  Value2  Feature Active
0  Iteration1      18      25               0
1  Iteration1       3      67               1
2  Iteration1      87      76               1
3  Iteration2      45      29               0
4  Iteration2      44       7               1
5  Iteration2      43      74               0


> df.groupby(["Feature Active", "Iteration"]).mean()

                           Value1  Value2
Feature Active Iteration                 
0              Iteration1      18    25.0
               Iteration2      44    51.5
1              Iteration1      45    71.5
               Iteration2      44     7.0

告诉我它是不是你想要的。

HTH

基于列值

2 个答案: