这是我在这里回答的问题:Pandas groupby selecting only one value based on 2 groups and converting rest to 0
我有一个pandas数据框,其日期时间索引如下所示:
df =
Fruit Quantity
01/02/10 Apple 4
01/02/10 Apple 6
01/02/10 Apple 12
01/02/10 Pear 7
01/02/10 Grape 8
01/02/10 Grape 5
02/02/10 Apple 2
02/02/10 Fruit 6
02/02/10 Pear 8
02/02/10 Pear 5
02/02/10 Apple 2
02/02/10 Apple 2
现在,对于每个日期和每个水果,我只想要两个值(最好是前两个)和日期剩下的水果保持为零。所以期望的输出如下:
Fruit Quantity
01/02/10 Apple 4
01/02/10 Apple 6
01/02/10 Apple 0
01/02/10 Pear 7
01/02/10 Grape 8
01/02/10 Grape 5
02/02/10 Apple 2
02/02/10 Fruit 6
02/02/10 Pear 8
02/02/10 Pear 5
02/02/10 Apple 2
02/02/10 Apple 0
这只是一个小例子,但我的主数据框有超过300万行,并且每个日期的结果不一定是正确的。
由于
答案 0 :(得分:2)
按cumcount
和date(index)
分组Fruit
,然后将计数大于1
的行归零:
df['QuanityTrimmed'] = df.Quantity.where(df.groupby([df.index, df.Fruit]).cumcount() < 2, 0)
print(df)
# Fruit Quantity QuanityTrimmed
#01/02/10 Apple 4 4
#01/02/10 Apple 6 6
#01/02/10 Apple 12 0
#01/02/10 Pear 7 7
#01/02/10 Grape 8 8
#01/02/10 Grape 5 5
#02/02/10 Apple 2 2
#02/02/10 Fruit 6 6
#02/02/10 Pear 8 8
#02/02/10 Pear 5 5
#02/02/10 Apple 2 2
#02/02/10 Apple 2 0