例如,我有三个表A,B,C
表A:
id1 value1
1 23
2 34
3 2342
4 333
表B:
id2 value2
1 apple
2 banana
3 berry
表C:
id3 value3 value4
1 red batman
2 green superman
3 white wonder woman
4 gray aquaman
5 yellow flash
我想根据索引表D
合并这三个表表D:
Table_A Table_B Table_C
1 3 2
3 4
2 2 3
4 1 1
5
我的结果表应该是:
id1 value1 id2 value2 id3 value3 value4
1 23 3 berry 2 green superman
3 2342 4 gray aquaman
2 34 2 banana 3 white wonder woman
4 333 1 apple 1 red batman
5 yellow flash
我可以通过Python Pandas或者我需要在Spark中做到吗?
答案 0 :(得分:0)
试试吧:
table_d['value1'] = table_d['Table_A'].map(table_a.set_index('id1')['value1'])
table_d['value2'] = table_d['Table_B'].map(table_b.set_index('id2')['value2'])
table_d.merge(table_c, left_on='Table_C', right_on='id3')
输出:
Table_A Table_B Table_C value1 value2 id3 value3 value4
0 1.0 3.0 2 23.0 berry 2 green superman
1 3.0 NaN 4 2342.0 NaN 4 gray aquaman
2 2.0 2.0 3 34.0 banana 3 white wonder woman
3 4.0 1.0 1 333.0 apple 1 red batman
4 NaN NaN 5 NaN NaN 5 yellow flash