在Pandas合并功能中,您可以合并两个数据帧,但我需要合并N,类似于在完全外部联接中组合N个表的SQL语句。例如,我需要通过'type_1', 'subject_id_1'
,'type_2', 'subject_id_2'
和'type_3', 'subject_id_3'
合并下面的三个数据框。这可能吗?
import pandas as pd
raw_data = {
'type_1': [1, 1, 0, 0, 1],
'subject_id_1': ['1', '2', '3', '4', '5'],
'first_name_1': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung']}
df_a = pd.DataFrame(raw_data, columns = ['type_1', 'subject_id_1', 'first_name_1'])
raw_datab = {
'type_2': [1, 1, 0, 0, 0],
'subject_id_2': ['4', '5', '6', '7', '8'],
'first_name_2': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty']}
df_b = pd.DataFrame(raw_datab, columns = ['type_2', 'subject_id_2', 'first_name_2'])
raw_datac = {
'type_3': [1, 1],
'subject_id_3': ['4', '5'],
'first_name_3': ['Joe', 'Paul']}
df_c = pd.DataFrame(raw_datac, columns = ['type_3', 'subject_id_3', 'first_name_3'])
### need to include here the third data frame
merged = pd.merge(df_a, df_b, left_on=['type_1','subject_id_1'],
right_on = ['type_2','subject_id_2'], how='outer')
print(merged)
注意:要加入的字段名称在每个数据框中都有所不同。
答案 0 :(得分:2)
我认为需要通过set_index
与concat
创建的索引加入:
dfs = [df_a.set_index(['type_1','subject_id_1']),
df_b.set_index(['type_2','subject_id_2']),
df_c.set_index(['type_3','subject_id_3'])]
df = pd.concat(dfs, axis=1)
print (df)
first_name_1 first_name_2 first_name_3
0 3 Allen NaN NaN
4 Alice NaN NaN
6 NaN Bran NaN
7 NaN Bryce NaN
8 NaN Betty NaN
1 1 Alex NaN NaN
2 Amy NaN NaN
4 NaN Billy Joe
5 Ayoung Brian Paul
df = pd.concat(dfs, axis=1).rename_axis(('type','subject_id')).reset_index()
print (df)
type subject_id first_name_1 first_name_2 first_name_3
0 0 3 Allen NaN NaN
1 0 4 Alice NaN NaN
2 0 6 NaN Bran NaN
3 0 7 NaN Bryce NaN
4 0 8 NaN Betty NaN
5 1 1 Alex NaN NaN
6 1 2 Amy NaN NaN
7 1 4 NaN Billy Joe
8 1 5 Ayoung Brian Paul