Question

我有这样的数据。

我计算每个ID的平均值

df.groupby(['ID'], as_index= False)['A'].mean()

现在，我想删除所有那些平均值超过3的ID。

df.drop(df[df.A > 3].index)

这就是我被困住了。我想以原始格式保存文件（没有分组，没有平均值），也没有那些手段超过3的ID。任何想法我怎样才能做到这一点。输出这样的东西。另外，我想知道在使用drop时删除了多少个唯一的ID。

Answer 1

对Series使用transform，其尺寸与原始DataFrame相同，因此可以通过boolean indexing从> 3更改为<=3的条件进行过滤}：

df1 = df[df.groupby('ID')['A'].transform('mean') <= 3]
print (df1)

   ID  A
0   1  2
1   1  3
2   1  1
6   3  6
7   3  1
8   3  1
9   3  1

<强>详情：

print (df.groupby('ID')['A'].transform('mean'))

0    2.000000
1    2.000000
2    2.000000
3    6.666667
4    6.666667
5    6.666667
6    2.250000
7    2.250000
8    2.250000
9    2.250000
Name: A, dtype: float64

print (df.groupby('ID')['A'].transform('mean') <= 3)

0     True
1     True
2     True
3    False
4    False
5    False
6     True
7     True
8     True
9     True
Name: A, dtype: bool

Answer 2

使用groupby和filter的另一种解决方案。这个解决方案比使用布尔索引的转换要慢。

df.groupby('ID').filter(lambda x: x['A'].mean() < 3)

输出：

通过Groupby计算Mean，使用布尔条件删除一些行，然后以原始格式保存文件

2 个答案: