我有一个解决问题的逻辑问题。我有两个数据帧。
Dataframe_one包含以下列:
[Id, workflowprofile_A, workflow_profile_B, important_string_info ]
Dataframe_two包含以下列:
[workflowprofile, option, workflow]
我的问题是来自Dataframe_two的workflowprofile可以是来自Dataframe_one的workflowprofile_A OR和AND workflow_profile_B。如何获得合并的数据框,其中列将如下所示。
dataframe_three:
[Id, workflowprofile_A,workflowprofile_fromA, option_fromA, workflow_fromA,important_string_info_fromA workflow_profile_B, workflowprofile_fromB, option_fromB, workflow_fromB, important_string_info_fromB]
答案 0 :(得分:2)
您可以按fillna
或combine_first
创建新列,因为始终有一个值为NaN
,然后按此列合并:
df1['workflowprofile'] = df1['workflowprofile_A'].fillna(df1['workflow_profile_B'])
#alternative
#df1['workflowprofile'] = df1['workflowprofile_A'].combine_first(df1['workflow_profile_B'])
df3 = pd.merge(df1, df2, on='workflowprofile')
样品:
print (df1)
Id workflowprofile_A workflow_profile_B important_string_info
0 1 7.0 NaN 8
1 2 NaN 5.0 1
print (df2)
workflowprofile option workflow
0 7 0 0
1 5 9 0
2 7 0 0
3 4 1 2
df1['workflowprofile'] = df1['workflowprofile_A'].fillna(df1['workflow_profile_B'])
df3 = pd.merge(df1, df2, on='workflowprofile')
print (df3)
Id workflowprofile_A workflow_profile_B important_string_info \
0 1 7.0 NaN 8
1 1 7.0 NaN 8
2 2 NaN 5.0 1
workflowprofile option workflow
0 7 0 0
1 7 0 0
2 5 9 0