鉴于此代码合并了三个Pandas数据帧:
raw_data = {
'type': [0, 1, 1],
'id': ['3', '4', '5'],
'name_1': ['Alex', 'Amy', 'Allen']}
df_a = pd.DataFrame(raw_data, columns = ['type', 'id', 'name_1' ])
df_a.set_index(['type', 'id'])
raw_datab = {
'type': [1, 1, 1, 0],
'id': ['4', '5', '5', '7'],
'name_2': ['Billy', 'Brian', 'Joe', 'Bryce']}
df_b = pd.DataFrame(raw_datab, columns = ['type', 'id', 'name_2'])
df_b.set_index(['type', 'id'])
raw_datac = {
'name_3': ['School', 'White', 'Jane', 'Homer'],
'id': ['4', '6', '5', '5'],
'type': [1, 1, 1, 1]}
df_c = pd.DataFrame(raw_datac, columns = ['name_3', 'id', 'type' ])
df_c.set_index(['type', 'id'])
dfx = df_a.merge(df_b, how='outer').merge(df_c, how='outer')
print(dfx)
我收到以下回复:
type id name_1 name_2 name_3
0 0 3 Alex NaN NaN
1 1 4 Amy Billy School
2 1 5 Allen Brian Jane
3 1 5 Allen Brian Homer
4 1 5 Allen Joe Jane
5 1 5 Allen Joe Homer
6 0 7 NaN Bryce NaN
7 1 6 NaN NaN White
我实际需要的是获得原始顺序中列的串联。例如:
type id name_1 type_2 id_2 name_2 name_3 id_3 type_3
0 3 Alex 0 3 NaN NaN 3 0
1 4 Amy 1 4 Billy School 4 1
1 5 Allen 1 5 Brian Jane 5 1
1 5 Allen 1 5 Brian Homer 5 1
1 5 Allen 1 5 Joe Jane 5 1
1 5 Allen 1 5 Joe Homer 5 1
0 7 NaN 0 7 Bryce NaN 7 0
1 6 NaN 1 6 NaN White 6 1
Pandas可以吗?
答案 0 :(得分:0)
我认为你可以这样做:
让我们使用这种语法将列的“副本”添加到每个数据帧的索引中,
[i.set_index([i['type'],i['id']], inplace=True) for i in [df_a, df_b, df_c]]
注意:df.set_index('Col1')
vs df.set_index(df['Col1'])
,后面的语法在索引中创建Col1的副本,而前者将Col1列移动到索引中。
现在,让我们将数据框合并到索引上,使用how ='outer',并使用suffixes
处理重复列命名:
df_a.merge(df_b, how='outer', suffixes=('_1',''), right_index=True, left_index=True)\
.merge(df_c, how='outer', suffixes=('_2','_3'), right_index=True, left_index=True)\
.reset_index()
输出:
type id type_1 id_1 name_1 type_2 id_2 name_2 name_3 id_3 type_3
0 0 3 0.0 3 Alex NaN NaN NaN NaN NaN NaN
1 0 7 NaN NaN NaN 0.0 7 Bryce NaN NaN NaN
2 1 4 1.0 4 Amy 1.0 4 Billy School 4 1.0
3 1 5 1.0 5 Allen 1.0 5 Brian Jane 5 1.0
4 1 5 1.0 5 Allen 1.0 5 Brian Homer 5 1.0
5 1 5 1.0 5 Allen 1.0 5 Joe Jane 5 1.0
6 1 5 1.0 5 Allen 1.0 5 Joe Homer 5 1.0
7 1 6 NaN NaN NaN NaN NaN NaN White 6 1.0
编辑:由于我们正在合并索引,我们可以改为使用join
。
df_a.join(df_b,how='outer',lsuffix='_1')
.join(df_c,how='outer',lsuffix='_2', rsuffix='_3')
.reset_index()