以下代码的目标是使用三个pandas数据帧实现FULL OUTER JOIN。应打印所有数据框的所有记录,如果两个或三个记录之间存在关系,则应将它们打印在同一行中。
用于关联数据帧的字段是第一个数据帧中的type_1
和id_1
,第二个数据帧中的type_2
和id_2
以及{{1第三个数据框中的{}和type_3
。
问题是第二个和第三个数据帧之间的关系不起作用。看一下第11行和第13行的情况,它应该是一行,因为id_3
= type_2
和type_3
= id_2
。预期输出位于第11行id_3
,不应打印第13行。如何解决这个问题?
11 NaN NaN NaN 7.0 8 KoKo 7.0 8 Kuku
结果:
import pandas as pd
raw_data = {
'type_1': [0, 1, 1, 2, 2],
'id_1': ['3', '4', '5', '3', '3'],
'name_1': ['Alex', 'Amy', 'Allen', 'Peter', 'Liz']}
df_a = pd.DataFrame(raw_data, columns = ['type_1', 'id_1', 'name_1' ])
raw_datab = {
'type_2': [1, 1, 1, 0,7],
'id_2': ['4', '5', '5', '7', '8'],
'name_2': ['Billy', 'Brian', 'Joe', 'Bryce', 'KoKo']}
df_b = pd.DataFrame(raw_datab, columns = ['type_2', 'id_2', 'name_2'])
raw_datac = {
'type_3': [1, 1, 1, 1, 2, 2, 7],
'id_3': ['4', '6', '5', '5', '3', '3','8'],
'name_3': ['School', 'White', 'Jane', 'Homer', 'Paul', 'Lorel', 'Kuku']}
df_c = pd.DataFrame(raw_datac, columns = ['type_3', 'id_3', 'name_3'])
merged = df_a
merged = merged.merge(df_b, how='outer', left_on=['type_1', 'id_1'],
right_on=['type_2', 'id_2'])
merged = merged.merge(df_c, how='outer', left_on=['type_1', 'id_1'],
right_on=['type_3', 'id_3'])
print(merged)
答案 0 :(得分:2)
您需要在merge
df_a[['key1','key2']]=df_a[['type_1', 'id_1']]
df_b[['key1','key2']]=df_b[['type_2', 'id_2']]
df_c[['key1','key2']]=df_c[['type_3', 'id_3']]
merged = df_a
merged = merged.merge(df_b, how='outer')
merged = merged.merge(df_c, how='outer')
merged.drop(['key1','key2'],1)
Out[81]:
type_1 id_1 name_1 type_2 id_2 name_2 type_3 id_3 name_3
0 0.0 3 Alex NaN NaN NaN NaN NaN NaN
1 1.0 4 Amy 1.0 4 Billy 1.0 4 School
2 1.0 5 Allen 1.0 5 Brian 1.0 5 Jane
3 1.0 5 Allen 1.0 5 Brian 1.0 5 Homer
4 1.0 5 Allen 1.0 5 Joe 1.0 5 Jane
5 1.0 5 Allen 1.0 5 Joe 1.0 5 Homer
6 2.0 3 Peter NaN NaN NaN 2.0 3 Paul
7 2.0 3 Peter NaN NaN NaN 2.0 3 Lorel
8 2.0 3 Liz NaN NaN NaN 2.0 3 Paul
9 2.0 3 Liz NaN NaN NaN 2.0 3 Lorel
10 NaN NaN NaN 0.0 7 Bryce NaN NaN NaN
11 NaN NaN NaN 7.0 8 KoKo 7.0 8 Kuku
12 NaN NaN NaN NaN NaN NaN 1.0 6 White