我有一个如下所示的数据框
data_file= pd.DataFrame({'person_id':[1,1,1,1,2,2,2,3,3,3],'ob.date': [np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
'observation': ['Age','interviewdate','marital_status','interviewdate','Age','interviewdate','marital_status','Age','interviewdate','marital_status'],
'answer': [21,'21/08/2017','Single','22/05/2217', 26,'11/03/2010','Single',41,'31/09/2012','Married']
})
我想做的是,从date values
列中提取answer
并将其放在ob.date
列中。提供的数据帧显示person_id =1
回答了21/08/2017
的年龄问题,而22/05/2017
回答了关于marital_status
的问题
这是我根据另一篇文章的SO建议尝试的
s = data_file[(data_file.observation == 'interviewdate')].set_index('person_id')['answer']
data_file['ob.date'] = data_file['person_id'].map(s)
但这在我得到duplicate index error
时不起作用。如何避免该问题并使它足够有效?
因此,任何优雅有效的解决方案都将有所帮助。 Person_id = 1有两个日期值,因此请用interviewdate
观测值的answer
列中的值填充interviewdate
观测值以上的所有行
我怎么期望我的输出像这样?
答案 0 :(得分:2)
所有数据均取决于数据-首先根据条件按answer
设置新列,然后每组通过前后填充来弥补缺失值:
data_file['ob.date'] = data_file.loc[(data_file.observation == 'interviewdate'), 'answer']
data_file['ob.date'] = (data_file.groupby('person_id')['ob.date']
.apply(lambda x: x.bfill().ffill()))
print (data_file)
person_id ob.date observation answer
0 1 21/08/2017 Age 21
1 1 21/08/2017 interviewdate 21/08/2017
2 1 22/05/2217 marital_status Single
3 1 22/05/2217 interviewdate 22/05/2217
4 2 11/03/2010 Age 26
5 2 11/03/2010 interviewdate 11/03/2010
6 2 11/03/2010 marital_status Single
7 3 31/09/2012 Age 41
8 3 31/09/2012 interviewdate 31/09/2012
9 3 31/09/2012 marital_status Married
详细信息:
首先用于每个组的反向归档,因为interviewdate
是边缘行-之前的所有值都是相同的子组。最后是添加forwrd填充,以便每组保留最后的NaN-不替换为bfill
:
data_file['ob.date'] = (data_file.groupby('person_id')['ob.date']
.apply(lambda x: x.bfill()))
print (data_file)
person_id ob.date observation answer
0 1 21/08/2017 Age 21
1 1 21/08/2017 interviewdate 21/08/2017
2 1 22/05/2217 marital_status Single
3 1 22/05/2217 interviewdate 22/05/2217
4 2 11/03/2010 Age 26
5 2 11/03/2010 interviewdate 11/03/2010
6 2 NaN marital_status Single
7 3 31/09/2012 Age 41
8 3 31/09/2012 interviewdate 31/09/2012
9 3 NaN marital_status Married