我有以下数据框df1
sports_id school_id time activity_name
1 2 09:00-11:00 soccer match
3 1 08:00-09:00 soccer practice
5 2 08:00-11:00 baseball
随后是包含学生ID和12月的每个日期的数据框df2
student_id sports_id school_id 12-01-2018 12-02-2018 12-03-2018 12-04-2018
0001 5 2 08:00-11:00 Rest 08:00-11:00 08:00-09:00
0002 3 1 08:00-09:00 Rest 08:00-09:00 08:00-09:00
0003 1 2 09:00-11:00 Rest 09:00-11:00 09:00-10:00
基于df1中的sports_id,school_id和时间,我想将activity_name映射到df2中的每个学生以获取以下数据框。如果不匹配,请在数据框中保留现有值。结果数据框将被跟踪>
student_id sports_id school_id 12-01-2018 12-02-2018 12-03-2018 12-04-2018
0001 5 2 baseball Rest baseball 08:00-09:00
0002 3 1 soccer practice Rest soccer practice soccer practice
0003 1 2 baseball Rest baseball 09:00-10:00
只是为了澄清, 如果sports_id = 5,school_id = 2并且时间为08:00-11:00,则将df2行中的值08:00-11:00替换为“棒球”(与df1中一样) 由于df1中不存在sports_id = 5,school_id = 2和time = 08:00-09:00的组合,因此请保留时间08:00-09:00,因为其在df2中的日期为12-04-2018 < / p>
简而言之,sports_id,school_id和time是3个键,而activity_name是与这3个键相对应的值。
我正在尝试做这样的事情
df2.applymap(df1.set_index(['sports_id','school_id','time'])['activity_name'])
但是它不起作用。
答案 0 :(得分:1)
长度解决方案,更改df2的形状,合并并重新塑形。
new_df = df2.set_index(['student_id','sports_id','school_id']).stack().reset_index(name = 'time').merge(df1, how = 'outer')
new_df.activity_name.fillna(new_df.time, inplace=True)
new_df = new_df.drop('time', 1).set_index(['student_id','sports_id','school_id', 'level_3']).activity_name.unstack().reset_index()
new_df.columns.name = None
student_id sports_id school_id 12-01-2018 12-02-2018 12-03-2018 12-04-2018
0 1 5 2 baseball Rest baseball 08:00-09:00
1 2 3 1 soccer practice Rest soccer practice soccer practice
2 3 1 2 soccer match Rest soccer match 09:00-10:00