Python大熊猫从左表获取行,并从左表获取右表缺少的行

时间:2018-09-05 07:32:34

标签: python python-3.x python-2.7 pandas

我有左表和右表,我需要以这种方式合并两个表中的FileStamp值:取左表中的所有值和左表中缺少的右表的所有值,并按'date'联接它:

import pandas as pd
left = pd.DataFrame({'FileStamp': ['T101', 'T102', 'T103', 'T104'], 'date': [20180101, 20180102, 20180103, 20180104]})
right = pd.DataFrame({'FileStamp': ['T501', 'T502'], 'date': [20180104, 20180105]})

类似

result = pd.merge(left, right, how='outer', on='date')

但是“外面”不是个好主意。

所需的输出应为

     FileStamp_x      date      FileStamp_y
0        T101       20180101         NaN
1        T102       20180102         NaN
2        T103       20180103         NaN
3        T104       20180104         NaN
4         NaN       20180105        T502

有没有简单的方法可以实现所需的输出?

2 个答案:

答案 0 :(得分:3)

merge之前使用isin进行过滤:

r = right[~right['date'].isin(left['date'])]
print (r)
  FileStamp      date
1      T502  20180105

result = pd.merge(left, r, how='outer', on='date')
print (result)
  FileStamp_x      date FileStamp_y
0        T101  20180101         NaN
1        T102  20180102         NaN
2        T103  20180103         NaN
3        T104  20180104         NaN
4         NaN  20180105        T502

答案 1 :(得分:1)

您可以调整merge之后的值:

result = pd.merge(left, right, how='outer', on='date')
result['FileStamp_y'] = np.where(result['FileStamp_x'].isnull(), result['FileStamp_y'], np.nan)

结果:

    FileStamp_x     date  FileStamp_y
0          T101 20180101          NaN
1          T102 20180102          NaN
2          T103 20180103          NaN
3          T104 20180104          NaN
4           NaN 20180105         T502