Hive - 优化自我加入

时间:2014-11-01 01:53:59

标签: sql hadoop hive bigdata

我们说我有以下问题:

select a.model, a.engine_size, b.engine_size from (

  select model, engine_size
  from cars
  where number_of_doors = 4
) a

inner join (

  select model, engine_size
  from cars
  where number_of_doors = 4
) b

on (a.model = b.model);

我在这里重复一个子查询。我只是想知道以下内容是否更加优化'或者是否会自动缓存重复的子查询结果?

with features as (

  select model, engine_size
  from cars
  where number_of_doors = 4
)

select a.model, a.engine_size, b.engine_size
from features a
inner join features b
on (a.model = b.model);

这些中的任何一个会更有效吗?

谢谢!

1 个答案:

答案 0 :(得分:0)

一种方法是通过自我加入,但场景没有任何意义

select a.model, a.engine_size,b.engine_size
from   cars a 
join   cars b 
on     (a.model = b.model)
where  a.number_of_doors = 4