我有两个如下所示的数据框
第一个数据框
data_file = pd.DataFrame({'person_id':[1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,3,3,3],
'event name': ['Second','First','Second','First','Second','First','Second','First','Second','Second','First','Second','First','Second','First','Second','First','First'],
'ob.date': [np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]
})
第二个数据帧
out_data = pd.DataFrame({'person_id':[1,1,2,2,3,3],'event name':['First','Second','First','Second','First','Second'],
'ob.date': ['23/08/2017','23/08/2017','11/08/2017','31/08/2017','25/08/2017','22/08/2017']})
第一个数据帧如下所示
第二个数据帧如下图所示
我想做的是基于ob.date
和out_data
将data_file
数据帧中的person_id
值映射到event name
。
这是我尝试过的
s = out_data.set_index(['person_id','event name'])['ob.date']
data_file['ob.date'] = data_file[('person_id','event name')].map(s)
遇到以下错误
KeyError :(“ person_id”,“事件名称”)
# But merge works well. Is the below correct?
pd.merge(data_file,out_data, on = ['person_id','event name'],how = 'inner')
如何避免这种情况,并基于多个键映射日期值并实现如下所示的输出?
答案 0 :(得分:2)
我认为这里最好与左连接合并:
df = pd.merge(data_file,out_data, on = ['person_id','event name'], how = 'left')
map
是可能的,但两列都需要元组:
s = out_data.set_index(['person_id','event name'])['ob.date']
s.index = s.index.tolist()
print (s)
(1, First) 23/08/2017
(1, Second) 23/08/2017
(2, First) 11/08/2017
(2, Second) 31/08/2017
(3, First) 25/08/2017
(3, Second) 22/08/2017
Name: ob.date, dtype: object
s1 = pd.Series(list(map(tuple, data_file[['person_id','event name']].values.tolist())),
index=data_file.index)
data_file['ob.date'] = s1.map(s)
或类似的内容:
s1 = data_file.set_index(['person_id','event name']).index.to_series()
s1.index = data_file.index
data_file['ob.date'] = s1.map(s)
print (data_file)
person_id event name ob.date
0 1 Second 23/08/2017
1 1 First 23/08/2017
2 1 Second 23/08/2017
3 1 First 23/08/2017
4 1 Second 23/08/2017
5 1 First 23/08/2017
6 1 Second 23/08/2017
7 2 First 11/08/2017
8 2 Second 31/08/2017
9 2 Second 31/08/2017
10 2 First 11/08/2017
11 3 Second 22/08/2017
12 3 First 25/08/2017
13 3 Second 22/08/2017
14 3 First 25/08/2017
15 3 Second 22/08/2017
16 3 First 25/08/2017
17 3 First 25/08/2017