DF1:
A B C
0 1 2 6
1 2 3 6
2 3 4 6
3 4 4 6
DF2:
A B C
0 2.0 3.0 7
1 3.0 3.0 7
2 NaN NaN 7
3 NaN NaN 7
4 4.0 4.0 7
预期:
A B C_x C_y
0 2 3 6 7
3 NaN NaN 6 7
4 NaN NaN 6 7
5 4 4 6 7
我一直在尝试下面的代码
代码:
import numpy as np
import pandas as pd
def get_df_merged_result(df1, df2, join_condition, column_list):
return pd.merge(df1, df2, how=join_condition , on=column_list)
#Create a DataFrame
df1=pd.DataFrame({'A':[1,2,3,4],'B':[2,3,4,4], 'C':[6,6,6,6]})
df2=pd.DataFrame({'A':[2,3,np.nan,np.nan,4],'B':[3,3,np.nan,np.nan,4],'C':[7,7,7,7,7]})
print(df1)
print('-------------')
print(df2)
print('-------------')
print(get_df_merged_result(df1, df2, 'inner', ['A','B']))
有人可以帮助获取不为null和null列的合并结果吗?我尝试使用内部和左侧联接条件
答案 0 :(得分:0)
不确定仅使用merge
是可行的。这是使用concat
的示例:
df1 = pd.DataFrame({'A':[1,2,3,4],'B':[2,3,4,4], 'C':[6,6,6,6]})
df2 = pd.DataFrame({'A':[2,3,np.nan,np.nan,4],'B':[3,3,np.nan,np.nan,4],'C':[7,7,7,7,7]})
# used for sequences
df1 = df1.reset_index()
df2 = df2.reset_index()
# cross records by A / B
df = df1.merge(df2, on=['A', 'B'])
df = df.rename(columns={'index_x': 'seq'}).drop(columns=['index_y'])
# select df with NaN records
nan_df = df2[(df2['A'].isna()) & (df2['B'].isna())]
nan_df = nan_df.rename(columns={'C': 'C_y'})
# generate C_x - C_y relations and merging into NaN records
nan_df = nan_df.merge(df[['C_x', 'C_y']].drop_duplicates(), on=['C_y'])
# union df with joined A / B records and df with NaN records
df = pd.concat([nan_df, df], sort=False).reset_index(drop=True)
def seq(x):
if pd.isna(x['seq']):
x['seq'] = x['index']
return x
# just sorting by origin indexes
df = df.apply(seq, axis=1)
df = df.sort_values(['seq'])
df = df.drop(columns=['index', 'seq'])
print(df.head())
# A B C_y C_x
# 2 2.0 3.0 7.0 6.0
# 0 NaN NaN 7.0 6.0
# 1 NaN NaN 7.0 6.0
# 3 4.0 4.0 7.0 6.0
查看评论。希望这会有所帮助。