假设我们有以下DataFrames:
import pandas as pd
import numpy as np
df1_column_array = [['foo', 'bar'],
['A', 'B']]
df1_column_tuple = list(zip(*df1_column_array))
df1_column_header = pd.MultiIndex.from_tuples(df1_column_tuple)
df1_index_array = [['one','two'],
['0', '1']]
df1_index_tuple = list(zip(*df1_index_array))
df1_index_header = pd.MultiIndex.from_tuples(df1_index_tuple)
df1 = pd.DataFrame(np.random.rand(2,2), columns = df1_column_header, index = df1_index_header)
print(df1)
foo bar
A B
one 1 0.755296 0.101329
two 2 0.925653 0.587948
df2_column_array = [['alpha', 'beta'],
['C', 'D']]
df2_column_tuple = list(zip(*df2_column_array))
df2_column_header = pd.MultiIndex.from_tuples(df2_column_tuple)
df2_index_array = [['three', 'four'],
['3', '4']]
df2_index_tuple = list(zip(*df2_index_array))
df2_index_header = pd.MultiIndex.from_tuples(df2_index_tuple)
df2 = pd.DataFrame(np.random.rand(2,2), columns = df2_column_header, index = df2_index_header)
print(df2)
alpha beta
C D
three 3 0.751013 0.957824
four 4 0.879353 0.045079
我想将这些DataFrame组合起来产生:
foo bar alpha beta
A B C D
one 1 0.755296 0.101329 NaN NaN
two 2 0.925653 0.587948 NaN NaN
three 3 NaN NaN 0.751013 0.957824
four 4 NaN NaN 0.879353 0.045079
当我尝试使用concat时,会保留索引的顺序,但不会保留列:
df_joined = pd.concat([df1,df2])
print(df_joined)
alpha bar beta foo
C B D A
one 1 NaN 0.101329 NaN 0.755296
two 2 NaN 0.587948 NaN 0.925653
three 3 0.751013 NaN 0.957824 NaN
four 4 0.879353 NaN 0.045079 NaN
当我尝试加入时,列的顺序会被保留,但不会保留索引:
df_joined = df1.join(df2, how = 'outer')
print(df_joined)
foo bar alpha beta
A B C D
four 4 NaN NaN 0.879353 0.045079
one 1 0.755296 0.101329 NaN NaN
three 3 NaN NaN 0.751013 0.957824
two 2 0.925653 0.587948 NaN NaN
组合DataFrame时,如何保留列和索引的顺序?
编辑1: 请注意:这是样本数据。我的真实世界数据没有方便的标签(例如1,2,3,4)可以排序。
编辑2: 将建议的解决方案应用于我的真实世界数据时,我收到以下错误:
Exception: cannot handle a non-unique multi-index!
答案 0 :(得分:1)
您可以使用hack
- 第一个concat获取Multiindex
,然后reindex
输出第二个concat
:
idx = pd.concat([df1,df2]).index
df_joined = pd.concat([df1,df2], axis=1).reindex(idx)
print (df_joined)
foo bar alpha beta
A B C D
one 0 0.269298 0.819375 NaN NaN
two 1 0.574702 0.798920 NaN NaN
three 3 NaN NaN 0.436893 0.822041
four 4 NaN NaN 0.757332 0.271900
使用DataFrames
创建Multiindexes
,加快解决方案并获取index
:
idx = pd.concat([pd.DataFrame(df1.index, index=df1.index),
pd.DataFrame(df2.index, index=df2.index)]).index
df_joined = pd.concat([df1,df2], axis=1).reindex(idx)
print (df_joined)
foo bar alpha beta
A B C D
one 0 0.007644 0.341335 NaN NaN
two 1 0.332005 0.449688 NaN NaN
three 3 NaN NaN 0.281876 0.883299
four 4 NaN NaN 0.880252 0.061797
EDIT1:
之前的解决方案问题reindex
讨厌重复。
因此,如果列中的Multiindex
不重复,您可以使用:
print(df1)
foo bar
A B
one 0 0.384705 0.932928
0 0.539197 0.519196
print(df2)
alpha beta
C D
three 3 0.957530 0.985926
four 4 0.479828 0.350042
cols = df1.join(df2, how = 'outer').columns
df_joined = pd.concat([df1,df2]).reindex(columns=cols)
print (df_joined)
foo bar alpha beta
A B C D
one 0 0.384705 0.932928 NaN NaN
0 0.539197 0.519196 NaN NaN
three 3 NaN NaN 0.957530 0.985926
four 4 NaN NaN 0.479828 0.350042