我想连接两个具有相同索引但不同列级别的数据帧。一个数据帧具有分层索引,另一个数据帧不具有。
print df1
A_1 A_2 A_3 .....
Value_V Value_y Value_V Value_y Value_V Value_y
instance200 50 0 6500 1 50 0
instance201 100 0 6400 1 50 0
另一个:
print df2
PV Estimate
instance200 2002313 1231233
instance201 2134124 1124724
结果应如下所示:
PV Estimate A_1 A_2 A_3 .....
Value_V Value_y Value_V Value_y Value_V Value_y
instance200 2002313 1231233 50 0 6500 1 50 0
instance201 2134124 1124724 100 0 6400 1 50 0
但是框架上的合并或连接会给我一个带有一维列索引的df:
PV Estimate (A_1,Value_V) (A_1,Value_y) (A_2,Value_V) (A_2,Value_y) .....
instance200 2002313 1231233 50 0 6500 1
instance201 2134124 1124724 100 0 6400 1
如何保持df1的层次结构索引?
答案 0 :(得分:6)
也许使用好的任务:
df3 = df1.copy()
df3[df2.columns] = df2
产量
A_1 A_2 A_3 PV Estimate
Value_V Value_y Value_V Value_y Value_V Value_y
instance200 50 0 6500 1 50 0 2002313 1231233
instance201 100 0 6400 1 50 0 2134124 1124724
答案 1 :(得分:3)
你可以通过使df2与df1具有相同的级别来实现这一点:
In [11]: df1
Out[11]:
A_1 A_2 A_3
Value_V Value_y Value_V Value_y Value_V Value_y
instance200 50 0 6500 1 50 0
instance201 100 0 6400 1 50 0
In [12]: df2
Out[12]:
PV Estimate
instance200 2002313 1231233
instance201 2134124 1124724
In [13]: df2.columns = pd.MultiIndex.from_arrays([df2.columns, [None] * len(df2.columns)])
In [14]: df2
Out[14]:
PV Estimate
NaN NaN
instance200 2002313 1231233
instance201 2134124 1124724
现在你可以在不破坏列名的情况下进行连接:
In [15]: pd.concat([df1, df2], axis=1)
Out[15]:
A_1 A_2 A_3 PV Estimate
Value_V Value_y Value_V Value_y Value_V Value_y NaN NaN
instance200 50 0 6500 1 50 0 2002313 1231233
instance201 100 0 6400 1 50 0 2134124 1124724
注意:要让df2列首先使用pd.concat([df2, df1], axis=1)
。
那就是说,我不确定我能想到一个用例,将它们保持为单独的DataFrame可能实际上是一个更简单的解决方案......!
答案 2 :(得分:0)
我为此目的构建了一个函数,如下所示:
def concat( df1, df2 ):
"""
Function concatenates two dataframes df1 and df2 even if the two datafames
have different number of hierarchical columns levels.
In the case of one dataframe having more hierarchical columns levels than the
other, blank string will be added to the upper hierarchical columns levels
"""
nLevels1 = df1.columns.nlevels
nLevels2 = df2.columns.nlevels
diff = nLevels2 - nLevels1
print(diff)
if nLevels1 == nLevels2:
# if the same simply concat as normal
return pd.concat( [df1, df2 ], axis = 1 )
elif nLevels1 < nLevels2:
# if there is a difference expand smaller dataframe with blank strings, then concat
a = [[""] * len( df1.columns )] * np.abs(diff)
a.append( df1.columns.to_list() )
df1.columns = a
return pd.concat( [df1, df2 ], axis = 1 )
elif nLevels1 > nLevels2:
# if there is a difference expand smaller dataframe with blank strings, then concat
a = [[""] * len( df2.columns )] * np.abs(diff)
a.append( df2.columns.to_list() )
df1.columns = a
return pd.concat( [df1, df2 ], axis = 1 )
现在,如果我们提供数据框
gender f m
n 2 1
y 2 2
和
gender f m
age old young old young
location london paris london paris london paris london
n 1 0 1 0 0 1 0
y 0 1 0 1 1 0 1
我们得到
f m
old young old young
london paris london paris london paris london f m
n 1 0 1 0 0 1 0 2 1
y 0 1 0 1 1 0 1 2 2
请注意,将来最好加入类别性别,使它们处于同一级别,但这在很大程度上是为了联接具有完全不同列的数据框。
答案 3 :(得分:0)
我为 pandas.concat
函数制作了一个包装器,它接受级别数不等的数据帧。
空层是从下面添加的。优点是它允许使用 df_cols.c
访问系列(在下面的 df_cols
中),并且在打印时,明确 'c'
不是 {{ 1}}。
('CC', 'one')
希望这对某人有所帮助。
测试:
def concat(dfs, axis=0, *args, **kwargs):
"""
Wrapper for `pandas.concat'; concatenate pandas objects even if they have
unequal number of levels on concatenation axis.
Levels containing empty strings are added from below (when concatenating along
columns) or right (when concateniting along rows) to match the maximum number
found in the dataframes.
Parameters
----------
dfs : Iterable
Dataframes that must be concatenated.
axis : int, optional
Axis along which concatenation must take place. The default is 0.
Returns
-------
pd.DataFrame
Concatenated Dataframe.
Notes
-----
Any arguments and kwarguments are passed onto the `pandas.concat` function.
See also
--------
pandas.concat
"""
def index(df):
return df.columns if axis==1 else df.index
def add_levels(df):
need = want - index(df).nlevels
if need > 0:
df = pd.concat([df], keys=[('',)*need], axis=axis) # prepend empty levels
for i in range(want-need): # move empty levels to bottom
df = df.swaplevel(i, i+need, axis=axis)
return df
want = np.max([index(df).nlevels for df in dfs])
dfs = [add_levels(df) for df in dfs]
return pd.concat(dfs, axis=axis, *args, **kwargs)