我正在尝试在两列上以pd.merge
的方式执行outer
。但是我需要输出数据帧将两列实际合并在一起,以便没有Nan值。
我举一个例子。假定以下将在时间戳列上合并的数据框:
a1=['2019-09-01 00:00:00', '2019-09-01 01:00:00', '2019-09-01 03:00:00', '2019-09-10 01:00:00']
a2=['a','c_1','d','f_1']
b1=['2019-09-01 00:10:00', '2019-09-01 01:00:00', '2019-09-01 03:07:00', '2019-09-10 01:00:00']
b2=['b','c_2', 'e', 'f_2']
A=pd.DataFrame({'a1':a1, 'a2':a2})
A.a1=pd.to_datetime(A.a1)
B=pd.DataFrame({'b1':b1, 'b2':b2})
B.b1=pd.to_datetime(B.b1)
我希望获得的合并数据框与此相似:
merged=pd.merge(A,B, left_on='a1', right_on='b1', how='outer', sort=True)
print(merged)
>>>
a1 a2 b1 b2
0 2019-09-01 00:00:00 a NaT NaN
1 NaT NaN 2019-09-01 00:10:00 b
2 2019-09-01 01:00:00 c_1 2019-09-01 01:00:00 c_2
3 2019-09-01 03:00:00 d NaT NaN
4 NaT NaN 2019-09-01 03:07:00 e
5 2019-09-10 01:00:00 f_1 2019-09-10 01:00:00 f_2
除了所需的输出应将“ a1”和“ b1”合并。看起来应该像这样:
datetime a2 b2 #datetime column has 'a1' and 'b1' merged
0 2019-09-01 00:00:00 a NaN
1 2019-09-01 00:10:00 NaN b
2 2019-09-01 01:00:00 c_1 c_2
3 2019-09-01 03:00:00 d NaN
4 2019-09-01 03:07:00 NaN e
5 2019-09-10 01:00:00 f_1 f_2
有人在想如何以pythonic / pandaic方式执行该操作吗?
预先感谢您:-)
答案 0 :(得分:0)
您可以在合并后使用熊猫的combine_first
函数:
merged['datetime'] = merged['a1'].combine_first(merged['b1'])
哪个值将为a1
,如果为na
,则值为b1