Question

给定DataFrame df：

                            yellowCard secondYellow redCard
match_id          player_id                                
1431183600x96x30  76921              X          NaN     NaN
                  76921            NaN            X       X
1431192600x162x32 71174              X          NaN     NaN

我想更新重复的行（具有相同索引），从而导致：

                            yellowCard secondYellow redCard
match_id          player_id                                
1431183600x96x30  76921              X            X       X
1431192600x162x32 71174              X          NaN     NaN

pandas是否提供了实现它的库方法？

Answer 1

看起来您的df在match_id和player_id上已被多索引，因此我会在match_id上执行groupby并填写NaN值两次，ffill和bfill：

In [184]:
df.groupby(level=0).fillna(method='ffill').groupby(level=0).fillna(method='bfill')

Out[184]:
                             yellowCard  secondYellow  redCard
match_id          player_id                                   
1431183600x96x30  76921               1             2        2
                  76921               1             2        2
1431192600x162x32 71174               3           NaN      NaN

我使用以下代码构建上述代码，而不是使用x值：

In [185]:
t="""match_id player_id yellowCard secondYellow redCard
1431183600x96x30  76921              1          NaN     NaN
1431183600x96x30  76921            NaN           2       2
1431192600x162x32 71174              3          NaN     NaN"""
df=pd.read_csv(io.StringIO(t), sep='\s+', index_col=[0,1])
df

Out[185]:
                             yellowCard  secondYellow  redCard
match_id          player_id                                   
1431183600x96x30  76921               1           NaN      NaN
                  76921             NaN             2        2
1431192600x162x32 71174               3           NaN      NaN

编辑 groupby对象有ffill和bfill方法，因此简化为：

In [189]:
df.groupby(level=0).ffill().groupby(level=0).bfill()

Out[189]:
                             yellowCard  secondYellow  redCard
match_id          player_id                                   
1431183600x96x30  76921               1             2        2
                  76921               1             2        2
1431192600x162x32 71174               3           NaN      NaN

然后，您可以拨打drop_duplicates：

In [190]:
df.groupby(level=0).ffill().groupby(level=0).bfill().drop_duplicates()

Out[190]:
                             yellowCard  secondYellow  redCard
match_id          player_id                                   
1431183600x96x30  76921               1             2        2
1431192600x162x32 71174               3           NaN      NaN

Answer 2

如果你做了

df.groupbby([df.match_id, df.player_id]).min()

NaN的默认行为会忽略它们。对于示例中表单的DataFrame（所有不一致都在NaN和填充值之间），这将完成这项工作。

修改

我假设X值是浮点数的占位符。对于字符串，请使用ffill和bfill的组合，例如EdChums答案（应该接受）。

更新相同索引的行

2 个答案: