我是Scala和Spark的新手,需要一些帮助。
我尝试连接两个数据帧,第一个数据帧递归重复。每个递归都将在输出中编号。一个例子如下。
第一个数据框是:
people.show()
+-----------+-------------+--------+
| record_id | record_name | branch |
+-----------+-------------+--------+
| 1 | one | 9 |
| 2 | two | 8 |
| 3 | three | 9 |
| 4 | four | 13 |
+-----------+-------------+--------+
第二个数据框是:
branch.show()
+--------+-------------------+
| branch | branch_supervisor |
+--------+-------------------+
| 9 | 10 |
| 8 | 11 |
| 10 | 12 |
| 11 | 12 |
| 13 | 14 |
+--------+-------------------+
结果将加入分支上的两个数据帧,然后递归地从加入中获取branch_supervisor并多次找到branch_supervisor,直到它不能是branch_supervisor。此外,层次结构的每个级别都将编号。例如,record_id 1的级别1是; branch 9有branch_supervisor 10,level 2是; branch 10有branch_supervisor 12等。
输出如下:
+-----------+-------------+--------+-------------------+-------+
| record_id | record_name | branch | branch_supervisor | level |
+-----------+-------------+--------+-------------------+-------+
| 1 | one | 9 | 10 | 1 |
| 1 | one | 10 | 12 | 2 |
| 2 | two | 8 | 11 | 1 |
| 2 | two | 11 | 12 | 2 |
| 3 | three | 9 | 10 | 1 |
| 3 | three | 10 | 12 | 2 |
| 4 | four | 13 | 14 | 1 |
+-----------+-------------+--------+-------------------+-------+
这可能吗?