Question

dfclean = dfclean[dfclean['Count'] > 1]

我用它来清除数据框中<1的'Count'值。 “计数”列的有效值从0到3，效果很好。

dfsorted = dfbottom.groupby("ST").filter(lambda dfbottom:dfbottom.shape[0] > 1)

我用它来过滤出<1的“ ST”实例。我只希望数据帧中具有> 1个实例的值。经过一段时间的堆栈溢出后，我使用了此代码，并找到了正确的代码以供理解。

dfbottom = dfbottom[dfbottom.groupby("ST").count() > 1]

如果可能的话，我需要帮助来理解为什么这没用？在我看来，这应该做类似的清理工作（查看“ ST”列，对值进行计数，在发现值> 1的情况下保留数据。相反，数据帧最终以所有NaN值结束。如果运行只是dfbottom代码，我得到了一个“ True”和“ False”表。该表是正确的，但是我显然缺少使用该数据创建新数据框的正确格式。

Answer 1

.count聚合数据框存在问题。

解决方案是使用GroupBy.transform返回Series，其大小与原始DataFrame相同，因此可以进行过滤：

dfbottom = dfbottom[dfbottom.groupby("ST")['ST'].transform('count') > 1]

示例：

dfbottom = pd.DataFrame({'ST':list('abbbcec')})
print (dfbottom)
  ST
0  a
1  b
2  b
3  b
4  c
5  e
6  c

dfbottom = dfbottom[dfbottom.groupby("ST")['ST'].transform('count') > 1]
print (dfbottom)
  ST
1  b
2  b
3  b
4  c
6  c

详细信息：

print (dfbottom.groupby("ST")['ST'].transform('count'))
0    1
1    3
2    3
3    3
4    2
5    1
6    2
Name: ST, dtype: int64

print (dfbottom.groupby("ST")['ST'].transform('count') > 1)
0    False
1     True
2     True
3     True
4     True
5    False
6     True
Name: ST, dtype: bool

如果要按汇总值过滤：

print (dfbottom.groupby("ST")['ST'].count())
ST
a    1
b    3
c    2
e    1
Name: ST, dtype: int64

print (dfbottom.groupby("ST")['ST'].count() > 1)
ST
a    False
b     True
c     True
e    False
Name: ST, dtype: bool

print (dfbottom[dfbottom.groupby("ST")['ST'].count() > 1])

IndexingError：作为索引器提供的不可对齐的布尔系列（布尔系列和被索引对象的索引不匹配

这不起作用，因为布尔掩码的大小不同-在此示例中，长度为4，而原始DataFrame为7。

试图了解为什么比较不起作用但过滤器起作用的原因（熊猫）

1 个答案: