我有两个数据帧dd1和dd2,我想加入这些数据帧。
dd1:
id name
1 red
2 green
3 yellow
4 black
5 pink
6 blue
7 white
8 grey
dd2:-
id name1
1 banana
2 apple
4 orange
8 grapes
9 leamon
并且我想在dd1数据帧中输出如下内容:
id name name1
1 red banana
2 green apple
3 yellow NULL
4 black orange
5 pink NULL
6 blue NULL
7 white NULL
8 grey grapes
答案 0 :(得分:0)
您可以尝试以下代码:
df = spark.createDataFrame(
[(1,'red'),(2,'green'),(3,'yellow'),(4,'black'),(5,'pink'),
(6,'blue'),(7,'white'),(8,'grey')], ["id", "name"])
+---+------+
| id| name|
+---+------+
| 1| red|
| 2| green|
| 3|yellow|
| 4| black|
| 5| pink|
| 6| blue|
| 7| white|
| 8| grey|
+---+------+
df1 = spark.createDataFrame(
[(1,'banana'),(2,'apple'),(4,'orange'),(8,'grapes'),(9,'leamon')], ["id1", "name1"])
+---+------+
|id1| name1|
+---+------+
| 1|banana|
| 2| apple|
| 4|orange|
| 8|grapes|
| 9|leamon|
+---+------+
condition = [df.id ==df1.id1]
inner_join=df.join(df1,condition,how='left')
inner_join=inner_join.drop("id1")
inner_join=inner_join.orderBy("id")
display(inner_join)
+---+------+------+
| id| name| name1|
+---+------+------+
| 1| red|banana|
| 2| green| apple|
| 3|yellow| null|
| 4| black|orange|
| 5| pink| null|
| 6| blue| null|
| 7| white| null|
| 8| grey|grapes|
+---+------+------+