如何在pandas和python中分离数据集中的完整和不完整的行(我需要将它们分开以获得插补的测试和训练模型)? 插补后如何将插补行放置在其原始索引上?
答案 0 :(得分:1)
您可以为此使用notnull()和dropna()函数
#creating a dummy dataset
s=[1,2,3,4,np.NAN,5]
s1=[1,2,np.NAN,np.NAN,3,4]
s2=[1,2,3,np.NAN,np.NAN,np.NAN]
df=pd.DataFrame({'r1':s,'r2':s1,'r3':s2})
#reset_index will add a column index for future concatenation
df=df.reset_index()
#getting the rows without null values
not_nulls=df.dropna()
#getting only the rows with null values
nulls=df[df.isnull().any(axis=1)]
#fill the null values using the required logic, Here im just filling with zero
nulls=nulls.fillna(0)
#combining not null and filled null rows
combined=pd.concat([nulls,not_nulls])
#sorting to get in the original order
combined=combined.sort_values(by='index')