在样本数据集的id下面,用于“ t_id”和“ parent_id”具有依赖关系的事务。
t_id ,名字,父母ID ,金额,部门ID , sal , datetime_updated
1 Jared None 1000 5 4088908 13/10/2017
2 Jared 1 -5000 1 8033313 17/10/2018
3 Jared 2 1000 5 17373148 23/07/2018
4 Tucker None 10000 3 16320817 08/09/2018
5 Tucker 4 -10000 2 5094970 24/08/2017
6 Tucker 5 5000 1 7435169 09/11/2018
7 Tucker 5 -2500 5 7859621 21/12/2018
8 Tucker 4 3000 2 5639934 14/07/2018
下面使用的查询
select
t1.t_id ,
t1.first_name,
t1.amount,
t1.parent_id,
t2.t_id ,
t2.first_name,
t2.amount,
t2.parent_id,
t3.t_id ,
t3.first_name,
t3.amount,
t3.parent_id,
t4.t_id ,
t4.first_name,
t4.amount,
t4.parent_id
from Transactions t1
left join Transactions t2
on t1.parent_id = t2.t_id
left join Transactions t3
on t2.parent_id = t3.t_id
left join Transactions t4
on t3.parent_id = t4.t_id;
上述查询的输出
+------+------------+--------+-----------+------+------------+--------+-----------+------+------------+--------+-----------+------+------------+--------+-----------+
| t_id | first_name | amount | parent_id | t_id | first_name | amount | parent_id | t_id | first_name | amount | parent_id | t_id | first_name | amount | parent_id |
+------+------------+--------+-----------+------+------------+--------+-----------+------+------------+--------+-----------+------+------------+--------+-----------+
| 1 | Jared | 1000 | 0 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 2 | Jared | -5000 | 1 | 1 | Jared | 1000 | 0 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 3 | Jared | 1000 | 2 | 2 | Jared | -5000 | 1 | 1 | Jared | 1000 | 0 | NULL | NULL | NULL | NULL |
| 4 | Tucker | 10000 | 0 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 5 | Tucker | -10000 | 4 | 4 | Tucker | 10000 | 0 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 6 | Tucker | 5000 | 5 | 5 | Tucker | -10000 | 4 | 4 | Tucker | 10000 | 0 | NULL | NULL | NULL | NULL |
| 7 | Tucker | -2500 | 5 | 5 | Tucker | -10000 | 4 | 4 | Tucker | 10000 | 0 | NULL | NULL | NULL | NULL |
| 8 | Thane | 3000 | 4 | 4 | Tucker | 10000 | 0 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 9 | Nicholas | 1000 | 0 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 10 | Mason | 2000 | 0 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 11 | Noah | 5000 | 0 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
+------+------------+--------+-----------+------+------------+--------+-----------+------+------------+--------+-----------+------+------------+--------+-----------+
问题/问题
I want generate the same output as mention above results,
but I cannot use the above join condition
as it is failing over larger data set when working on spark-sql.
Is there any other way I can optimise the above query to generate same
kind of data.