基于两列合并两个数据框

时间:2020-01-09 07:41:30

标签: python python-3.x pandas dataframe

我有两个数据帧:df1df_outdf1看起来像这样

   phone_number1  phone_number2
1    123             123
2    345              0
3    678             678
4    912             912
5    555             666

还有df_out

    phone_number  address   name
1      123         add1     name1
2     777777       add2     name2
3      666         add3     name3
4      555         add4     name4

并想做这样的事情

    phone_number  address   name    df1_phone_number1    df1_phone_number2
1      123         add1     name1        123                  123
2     777777       add2     name2        None                 None
3      666         add3     name3        None                 666
4      555         add4     name4        555                  None

我尝试了这段代码,但不会保存与phone_number2匹配的数据

    mergedStuff3 = pd.concat([
            pd.merge(df1, df_out,  how='right', left_on=['phone_number1'], right_on = ['phone_number']),
            pd.merge(df1, df_out,  how='right', left_on=['phone_number2', 'phone_number1'], right_on = ['phone_number', 'phone_number'])

        ])

2 个答案:

答案 0 :(得分:1)

我认为您需要Series.whereSeries.isin支票会员身份:

m1 = df_out['phone_number'].isin(df1['phone_number1'])
m2 = df_out['phone_number'].isin(df1['phone_number2'])
df_out['df1_phone_number1'] = df_out['phone_number'].where(m1)
df_out['df1_phone_number2'] = df_out['phone_number'].where(m2)
print (df_out)
   phone_number address   name  df1_phone_number1  df1_phone_number2
1           123    add1  name1              123.0              123.0
2        777777    add2  name2                NaN                NaN
3           666    add3  name3                NaN              666.0
4           555    add4  name4              555.0                NaN

或使用numpy.where

m1 = df_out['phone_number'].isin(df1['phone_number1'])
m2 = df_out['phone_number'].isin(df1['phone_number2'])
df_out['df1_phone_number1'] = np.where(m1, df_out['phone_number'], None)
df_out['df1_phone_number2'] = np.where(m2, df_out['phone_number'], None)
print (df_out)
   phone_number address   name df1_phone_number1 df1_phone_number2
1           123    add1  name1               123               123
2        777777    add2  name2              None              None
3           666    add3  name3              None               666
4           555    add4  name4               555              None

如果有更多的列,则应更改解决方案:

for c in df1.columns:
    m = df_out['phone_number'].isin(df1[c])
    df_out[f'df1_{c}'] = np.where(m, df_out['phone_number'], None)

答案 1 :(得分:0)

DataFrame.melt之前使用DataFrame.merge,然后只需与DataFrame.join联接。

这将适用于df1中的n列电话号码

df_out.join(df_out.merge(df1.melt(),
                         how = 'inner',
                         left_on ='phone_number',             
                         right_on = 'value')
                  .pivot_table(index = 'phone_number',
                               columns = 'variable',
                               values = 'value'),
            on = 'phone_number')

输出

   phone_number address   name  phone_number1  phone_number2
1           123    add1  name1          123.0          123.0
2        777777    add2  name2            NaN            NaN
3           666    add3  name3            NaN          666.0
4           555    add4  name4          555.0            NaN