我有两个pandas DataFrames
>>> import pandas as pd
>>> import numpy as np
>>> df1 = pd.DataFrame({'a': [1, 2, 3, 4], 'b': [np.nan, np.nan, 3, 4]},
index=[['A', 'A', 'B', 'B'], [1, 2, 1, 2]])
>>> df1
a b
A 1 1 NaN
2 2 NaN
B 1 3 3
2 4 4
和
>>> df2 = pd.DataFrame({'b': [1, 2]}, index=[['A','A'], [1, 2]])
>>> df2
b
A 1 1
2 2
其中df2包含df1的缺失数据。如何合并两个DataFrame来获取
a b
A 1 1 1
2 2 2
B 1 3 3
2 4 4
?我尝试了pd.concat([df1,df2], axis=1)
,结果是
a b b
A 1 1 NaN 1
2 2 NaN 2
B 1 3 3 NaN
2 4 4 NaN
在我的情况下,保证我没有重叠值。
答案 0 :(得分:3)
您可以尝试combine_first
或fillna
。
print df1.combine_first(df2)
a b
A 1 1 1
2 2 2
B 1 3 3
2 4 4
print df1.fillna(df2)
a b
A 1 1 1
2 2 2
B 1 3 3
2 4 4
定时:
In [5]: %timeit df1.combine_first(df2)
The slowest run took 6.01 times longer than the fastest. This could mean that an intermediate result is being cached
100 loops, best of 3: 2.15 ms per loop
In [6]: %timeit df1.fillna(df2)
The slowest run took 5.23 times longer than the fastest. This could mean that an intermediate result is being cached
100 loops, best of 3: 2.76 ms per loop
答案 1 :(得分:2)
您还可以使用update
:
In [36]: df1.update(df2)
In [37]: df1
Out[37]:
a b
A 1 1 1
2 2 2
B 1 3 3
2 4 4