更新相同索引的行

时间:2015-06-02 09:57:18

标签: python pandas

给定DataFrame df

                            yellowCard secondYellow redCard
match_id          player_id                                
1431183600x96x30  76921              X          NaN     NaN
                  76921            NaN            X       X
1431192600x162x32 71174              X          NaN     NaN

我想更新重复的行(具有相同索引),从而导致:

                            yellowCard secondYellow redCard
match_id          player_id                                
1431183600x96x30  76921              X            X       X
1431192600x162x32 71174              X          NaN     NaN

pandas是否提供了实现它的库方法?

2 个答案:

答案 0 :(得分:2)

看起来您的df在match_idplayer_id上已被多索引,因此我会在match_id上执行groupby并填写NaN值两次,ffill和bfill:

In [184]:
df.groupby(level=0).fillna(method='ffill').groupby(level=0).fillna(method='bfill')

Out[184]:
                             yellowCard  secondYellow  redCard
match_id          player_id                                   
1431183600x96x30  76921               1             2        2
                  76921               1             2        2
1431192600x162x32 71174               3           NaN      NaN

我使用以下代码构建上述代码,而不是使用x值:

In [185]:
t="""match_id player_id yellowCard secondYellow redCard
1431183600x96x30  76921              1          NaN     NaN
1431183600x96x30  76921            NaN           2       2
1431192600x162x32 71174              3          NaN     NaN"""
df=pd.read_csv(io.StringIO(t), sep='\s+', index_col=[0,1])
df

Out[185]:
                             yellowCard  secondYellow  redCard
match_id          player_id                                   
1431183600x96x30  76921               1           NaN      NaN
                  76921             NaN             2        2
1431192600x162x32 71174               3           NaN      NaN

编辑 groupby对象有ffillbfill方法,因此简化为:

In [189]:
df.groupby(level=0).ffill().groupby(level=0).bfill()

Out[189]:
                             yellowCard  secondYellow  redCard
match_id          player_id                                   
1431183600x96x30  76921               1             2        2
                  76921               1             2        2
1431192600x162x32 71174               3           NaN      NaN

然后,您可以拨打drop_duplicates

In [190]:
df.groupby(level=0).ffill().groupby(level=0).bfill().drop_duplicates()

Out[190]:
                             yellowCard  secondYellow  redCard
match_id          player_id                                   
1431183600x96x30  76921               1             2        2
1431192600x162x32 71174               3           NaN      NaN

答案 1 :(得分:1)

如果你做了

df.groupbby([df.match_id, df.player_id]).min()

NaN的默认行为会忽略它们。对于示例中表单的DataFrame(所有不一致都在NaN和填充值之间),这将完成这项工作。

修改

我假设X值是浮点数的占位符。对于字符串,请使用ffillbfill的组合,例如EdChums答案(应该接受)。