我正在处理像这样的熊猫数据框
ID have time
1 NaN 2010-07-01
1 1 2010-07-08
1 5 2011-07-08
1 NaN 2011-08-08
1 NaN 2012-05-08
1 NaN 2012-09-08
1 1 2012-10-08
2 NaN 2013-01-18
2 1 2013-02-18
2 NaN 2013-03-18
我想用ID组(个人)替换缺失值,并且仅在一年之内用个人级别的非缺失值替换记录:
ID have want time
1 NaN NaN 2010-07-01
1 1 1 2010-07-08
1 5 5 2011-07-08
1 NaN 5 2011-08-08
1 NaN 5 2012-05-08
1 NaN NaN 2012-09-08
1 1 1 2012-10-08
2 NaN NaN 2013-01-18
2 1 1 2013-02-18
2 NaN 1 2013-03-18
有什么有效的方法可以做到这一点吗?
我正在使用以下似乎在每一行都有效的代码
df = pd.DataFrame([
[1.0, np.nan, np.nan, "2010-07-01"],
[1.0,"1", "1", "2010-07-08"],
[1.0,"5", "5", "2011-07-08"],
[1.0,np.nan, "5", "2011-08-08"],
[1.0, np.nan, "5", "2012-05-08"],
[1.0, np.nan,np.nan, "2012-09-08"],
[1.0,"1", "1", "2012-10-08"],
[2.0, np.nan, np.nan, "2013-01-18"],
[2.0, "1", "1", "2013-02-18"],
[2.0, np.nan, "1", "2013-03-18"]
], columns = ['ID', 'have', 'want', 'time'])
df['time']=pd.to_datetime(df['time'], format='%Y-%m-%d')
def want(df):
for ind, row in df.iterrows():
df.loc[ind,'ewant']=df.loc[ind,'edatum']
if ind != 0:
if pd.isnull(df.loc[ind,'dosage']) == 1:
temp = ind - 1
df.loc[ind,'ewant']=df.loc[temp,'ewant']
else:
pass
else:
pass
df.loc[ind,'timespan']=(df.loc[ind,'edatum'] - df.loc[ind,'ewant']).days
df.loc[ind,'impu']=np.where( 0< (df.loc[ind,'edatum'] - df.loc[ind,'ewant']).days <= 365 , 1, 0)
return df
want(df)
但是当我尝试将其应用于“ ID”组级别
want(df.groupby(['ID']))
我遇到了这个迭代错误:
AttributeError: Cannot access callable attribute 'iterrows' of 'DataFrameGroupBy' objects, try using the 'apply' method
是否有某种方法可以解决此迭代错误?谢谢!
答案 0 :(得分:0)
这是完美的解决方法merge_asof
df1=df.dropna()
df=pd.merge_asof(df,df1,by='ID',on='time',tolerance=pd.Timedelta(12, unit='M'))
df#have_y is the column you want
ID have_x time have_y
0 1 NaN 2010-07-01 NaN
1 1 1.0 2010-07-08 1.0
2 1 5.0 2011-07-08 5.0
3 1 NaN 2011-08-08 5.0
4 1 NaN 2012-05-08 5.0
5 1 NaN 2012-09-08 NaN
6 1 1.0 2012-10-08 1.0
7 2 NaN 2013-01-18 NaN
8 2 1.0 2013-02-18 1.0
9 2 NaN 2013-03-18 1.0