Question

我做了以下实验。

查询1：

 select f1, f2 from A where id = 10 limit 1

 | f1  |  f2  |
 --------------
 |  1  |  2   |

查询2：

 select * from B as b where b.f1 = 1 and b.f2 = 2 limit 1

查询1和查询2的运行速度都非常快。

但是我何时执行以下操作

 select B.* 
 from B join A 
 on B.f1 = A.f1 and B.f2 = A.f2 
 where A.id = 10 limit 1

运行速度很慢，有很多阶段和任务...

我假设最后一个查询不会比查询1和查询2给定的“限制1”昂贵。它的计划如下。这是否表明限制1仅在所有连接完成后才使用...？

== Optimized Logical Plan ==
GlobalLimit 1
+- LocalLimit 1
   +- Join Inner, ((obj_id#352L = obj_id#342L) && (obj_type#351 = obj_type#341))
      :- Project [uid#350L, obj_type#351, obj_id#352L]
      :  +- Filter ...
      :     +- Relation[...] parquet
      +- Aggregate [obj_id#342L, obj_type#341], [obj_id#342L, obj_type#341]
         +- Project [obj_type#341, obj_id#342L]
            +- Filter ...
               +- Relation[...] parquet

加入时，Spark SQL是否考虑限制？

0 个答案: