我有以下要填写NA值的DF。
我想用以下递增值填充它:
import pandas as pd
data = [[1,1 ],[1, 1 ], [2, None], [3, None]]
df = pd.DataFrame(data, columns = ['user', 'days_unseen'])
#current behavior of ffill, leaves value the same
df['value']=df['value'].ffill()
print(df)
#desired fill - increments last value by 1
desired_data = [[1,1 ],[1, 1 ], [2, 2], [3, 3]]
desired_df = pd.DataFrame(desired_data, columns = ['user', 'days_unseen'])
print(desired_df)
答案 0 :(得分:3)
将Series.isna
与Series.cumsum
捆绑在一起以计算缺失值,并通过向前填充缺失值来添加最后一个非缺失值:
df['value'] = df['days_unseen'].isna().cumsum() + df['days_unseen'].ffill()
print(df)
user days_unseen value
0 1 1.0 1.0
1 1 1.0 1.0
2 2 NaN 2.0
3 3 NaN 3.0
答案 1 :(得分:1)
您可以使用辅助序列为NaN值的每个序列计算递增序列:
s = df.days_unseen.shift().loc[df.days_unseen.isna()]
s = pd.Series(data=1, index=s.index).cumsum() + s.fillna(
method='ffill')
然后您可以使用它来填充原始数据框中的空值:
df.days_unseen.fillna(s, inplace=True)
它给出了预期的结果:
user days_unseen
0 1 1.0
1 1 1.0
2 2 2.0
3 3 3.0