熊猫仅与列中的非空值合并,并保留具有空值的值

时间:2020-06-11 11:19:35

标签: python pandas

DF1:

     A  B  C
0  1  2  6
1  2  3  6
2  3  4  6
3  4  4  6

DF2:

        A    B  C
0  2.0  3.0  7
1  3.0  3.0  7
2  NaN  NaN  7
3  NaN  NaN  7
4  4.0  4.0  7

预期:

   A   B    C_x  C_y
0  2   3    6    7
3  NaN NaN  6    7
4  NaN NaN  6    7
5  4   4    6    7

我一直在尝试下面的代码

代码:

import numpy as np
import pandas as pd


def get_df_merged_result(df1, df2, join_condition, column_list):
    return pd.merge(df1, df2, how=join_condition , on=column_list)

#Create a DataFrame
df1=pd.DataFrame({'A':[1,2,3,4],'B':[2,3,4,4], 'C':[6,6,6,6]})
df2=pd.DataFrame({'A':[2,3,np.nan,np.nan,4],'B':[3,3,np.nan,np.nan,4],'C':[7,7,7,7,7]})

print(df1)
print('-------------')
print(df2)
print('-------------')
print(get_df_merged_result(df1, df2, 'inner', ['A','B']))

有人可以帮助获取不为null和null列的合并结果吗?我尝试使用内部和左侧联接条件

1 个答案:

答案 0 :(得分:0)

不确定仅使用merge是可行的。这是使用concat的示例:

df1 = pd.DataFrame({'A':[1,2,3,4],'B':[2,3,4,4], 'C':[6,6,6,6]})
df2 = pd.DataFrame({'A':[2,3,np.nan,np.nan,4],'B':[3,3,np.nan,np.nan,4],'C':[7,7,7,7,7]})

# used for sequences
df1 = df1.reset_index()
df2 = df2.reset_index()
# cross records by A / B
df = df1.merge(df2, on=['A', 'B'])
df = df.rename(columns={'index_x': 'seq'}).drop(columns=['index_y'])
# select df with NaN records
nan_df = df2[(df2['A'].isna()) & (df2['B'].isna())]
nan_df = nan_df.rename(columns={'C': 'C_y'})
# generate C_x - C_y relations and merging into NaN records
nan_df = nan_df.merge(df[['C_x', 'C_y']].drop_duplicates(), on=['C_y'])
# union df with joined A / B records and df with NaN records
df = pd.concat([nan_df, df], sort=False).reset_index(drop=True)

def seq(x):
    if pd.isna(x['seq']):
        x['seq'] = x['index']
    return x

# just sorting by origin indexes
df = df.apply(seq, axis=1)
df = df.sort_values(['seq'])
df = df.drop(columns=['index', 'seq'])
print(df.head())
     # A    B  C_y  C_x
# 2  2.0  3.0  7.0  6.0
# 0  NaN  NaN  7.0  6.0
# 1  NaN  NaN  7.0  6.0
# 3  4.0  4.0  7.0  6.0

查看评论。希望这会有所帮助。