假设我有三个数据帧:
from pandas import DataFrame
df1 = DataFrame([
[1],
[3],
[4]
],
index=[1, 3, 4],
columns=['value1']
)
df2 = DataFrame([
[5],
[6],
[7],
],
index=[5, 6, 7],
columns=['value2']
)
df3 = DataFrame([
[5, 9],
[6, 10],
[7, 11],
[8, 12]
],
index=[5, 6, 7, 8],
columns=['value1', 'value2']
)
使用
concat([df1, df2, df3], sort=True, axis=1)
现在会给我
value1 value2 value1 value2
1 1.0 NaN NaN NaN
3 3.0 NaN NaN NaN
4 4.0 NaN NaN NaN
5 NaN 5.0 5.0 9.0
6 NaN 6.0 6.0 10.0
7 NaN 7.0 7.0 11.0
8 NaN NaN 8.0 12.0
现在,如何获得结果
value1 value2
1 1.0 NaN
3 3.0 NaN
4 4.0 NaN
5 5.0 5.0
6 5.0 6.0
7 7.0 7.0
8 8.0 12.0
换句话说,对于同名的列,如何将它们“合并到左侧”?我正在寻找一种通用的解决方案,该解决方案可以接受任意数量的具有相同名称的多个列(并且还具有只能出现一次的列名称)。
答案 0 :(得分:4)
df = df1.combine_first(df2).combine_first(df3)
print (df)
value1 value2
1 1.0 NaN
3 3.0 NaN
4 4.0 NaN
5 5.0 5.0
6 6.0 6.0
7 7.0 7.0
8 8.0 12.0
与list of DataFrames
一起使用的更通用的解决方案是使用reduce
:
from functools import reduce
dfs = [df1, df2, df3]
df = reduce(lambda l,r: pd.DataFrame.combine_first(l,r), dfs)