Question

我在和熊猫玩耍。鉴于

day pokemon date cp 14 Abra 2016-11-14 14:08:37.205617 377 2016-11-14 22:47:02.467526 374 Bellsprout 2016-11-14 09:02:41.420506 460 2016-11-14 09:31:29.026961 541 2016-11-14 09:42:49.151360 125

我想为每个 pokemon 值添加一个新列，其中组插入了 cp 的意思。您可能认为，这是一个multiIndex结构，其中（day，pokemon，date）是索引元组。

到目前为止，我试图解决此问题，将此数据框合并为通过 day 和 pokemon 分组并应用平均操作获得的数据框。结果，我松开了 date 字段，我仍然无法将其合并到我上面发布的那个字段中。

我的预期结果是这样的：

day  pokemon     date                          cp   mean
14   Abra        2016-11-14 14:08:37.205617    377  
                 2016-11-14 22:47:02.467526    374  375.5
     Bellsprout  2016-11-14 09:02:41.420506    460
                 2016-11-14 09:31:29.026961    541
                 2016-11-14 09:42:49.151360    125  375.3

你会如何解决这个问题？谢谢，FB

Answer 1

我认为您首先需要transform，然后通过boolean indexing反复duplicated，set up aliases NaN添加mask：

g = df.groupby(level=[0,1])
df['mean'] = g['cp'].transform('mean')
df['mean'] = df['mean'][g['mean'].apply(lambda x: ~x.duplicated(keep='last'))]
print (df)
                                            cp        mean
day pokemon    date                                       
14  Abra       2016-11-14 14:08:37.205617  377         NaN
               2016-11-14 22:47:02.467526  374  375.500000
    Bellsprout 2016-11-14 09:02:41.420506  460         NaN
               2016-11-14 09:31:29.026961  541         NaN
               2016-11-14 09:42:49.151360  125  375.333333

在多索引上应用组聚合

1 个答案: