如何基于熊猫中的多个键进行映射而不进行合并

时间:2019-08-20 05:09:01

标签: python python-3.x pandas dataframe dictionary

我有两个如下所示的数据框

第一个数据框

data_file = pd.DataFrame({'person_id':[1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,3,3,3],
             'event name': ['Second','First','Second','First','Second','First','Second','First','Second','Second','First','Second','First','Second','First','Second','First','First'],
             'ob.date': [np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]
             })

第二个数据帧

out_data = pd.DataFrame({'person_id':[1,1,2,2,3,3],'event name':['First','Second','First','Second','First','Second'],
                     'ob.date': ['23/08/2017','23/08/2017','11/08/2017','31/08/2017','25/08/2017','22/08/2017']})

第一个数据帧如下所示

enter image description here

第二个数据帧如下图所示

enter image description here

我想做的是基于ob.dateout_datadata_file数据帧中的person_id值映射到event name

这是我尝试过的

s = out_data.set_index(['person_id','event name'])['ob.date']
data_file['ob.date'] = data_file[('person_id','event name')].map(s)

遇到以下错误

  

KeyError :(“ person_id”,“事件名称”)

# But merge works well. Is the below correct?

pd.merge(data_file,out_data, on = ['person_id','event name'],how = 'inner')

如何避免这种情况,并基于多个键映射日期值并实现如下所示的输出?

enter image description here

1 个答案:

答案 0 :(得分:2)

我认为这里最好与左连接合并:

df = pd.merge(data_file,out_data, on = ['person_id','event name'], how = 'left')

map是可能的,但两列都需要元组:

s = out_data.set_index(['person_id','event name'])['ob.date']
s.index = s.index.tolist()
print (s)
(1, First)     23/08/2017
(1, Second)    23/08/2017
(2, First)     11/08/2017
(2, Second)    31/08/2017
(3, First)     25/08/2017
(3, Second)    22/08/2017
Name: ob.date, dtype: object

s1 = pd.Series(list(map(tuple, data_file[['person_id','event name']].values.tolist())), 
               index=data_file.index)
data_file['ob.date'] = s1.map(s)

或类似的内容:

s1 = data_file.set_index(['person_id','event name']).index.to_series()
s1.index = data_file.index
data_file['ob.date'] = s1.map(s)

print (data_file)
   person_id event name     ob.date
0           1     Second  23/08/2017
1           1      First  23/08/2017
2           1     Second  23/08/2017
3           1      First  23/08/2017
4           1     Second  23/08/2017
5           1      First  23/08/2017
6           1     Second  23/08/2017
7           2      First  11/08/2017
8           2     Second  31/08/2017
9           2     Second  31/08/2017
10          2      First  11/08/2017
11          3     Second  22/08/2017
12          3      First  25/08/2017
13          3     Second  22/08/2017
14          3      First  25/08/2017
15          3     Second  22/08/2017
16          3      First  25/08/2017
17          3      First  25/08/2017