假设我有以下DataFrame:
df = pd.DataFrame({'player': ['LBJ', 'LBJ', 'LBJ', 'Kyrie', 'Kyrie', 'LBJ', 'LBJ'],
'points': [25, 32, 26, 21, 29, 21, 35]})
如何执行与ffill相反的操作,以便我可以获得以下DataFrame:
df = pd.DataFrame({'player': ['LBJ', np.nan, np.nan, 'Kyrie', np.nan, 'LBJ', np.nan],
'points': [25, 32, 26, 21, 29, 21, 35]})
也就是说,我想用NaN直接填充重复的值。
这是我到目前为止所做的,但我希望有一个内置的熊猫方法或更好的方法:
for i, (index, row) in enumerate(df.iterrows()):
if i == 0:
continue
go_back = 1
while True:
past_player = df.ix[i-go_back, 'player']
if pd.isnull(past_player):
go_back += 1
continue
if row['player'] == past_player:
df.set_value(index, 'player', value=np.nan)
break
答案 0 :(得分:3)
ffinv = lambda s: s.mask(s == s.shift())
df.assign(player=ffinv(df.player))
player points
0 LBJ 25
1 NaN 32
2 NaN 26
3 Kyrie 21
4 NaN 29
5 LBJ 21
6 NaN 35
答案 1 :(得分:1)
可能不是最有效的解决方案,但可以使用itertools.groupby
和itertools.chain
:
>>> df['player'] = list(itertools.chain.from_iterable([key] + [float('nan')]*(len(list(val))-1)
for key, val in itertools.groupby(df['player'].tolist())))
>>> df
player points
0 LBJ 25
1 NaN 32
2 NaN 26
3 Kyrie 21
4 NaN 29
5 LBJ 21
6 NaN 35
更具体地说明了它的工作原理:
for key, val in itertools.groupby(df['player']):
print([key] + [float('nan')]*(len(list(val))-1))
,并提供:
['LBJ', nan, nan]
['Kyrie', nan]
['LBJ', nan]
然后"链接"在一起。