Question

我正在研究大型数据集，我需要优化查询。我有一个观点abc_view。查询视图下方使用了4次。因此，每次执行视图时，它都包含复杂的逻辑。如何构建视图只执行一次的查询。

    Select * from TableA a
    join abc_view v on(a.col1=v.line)
    where v.type='abc' 
union all
    Select * from TableA a
    join abc_view v on(a.col1=v.group)
    where v.type='bcd'
union all
    Select * from TableA a
    join abc_view v on(a.col1=v.cat)
    where v.type='cde'
union all
    Select * from TableA a
    join abc_view v on(a.col1=v.test)
    where v.type='def'

查询需要大约5分钟才能执行。我认为我必须从abc_view视图创建一个表，并在查询中使用该表来优化它或什么？

建议我优化查询。

Answer 1

对视图的单个引用可能会提高性能，但不能保证。在标准SQL中，您可以执行以下操作：

Select *
from TableA a join
     abc_view v
     on (a.col1 = v.line and v.type = 'abc' ) or
        (a.col1 = v.group and v.type = 'bcd' ) or
        (a.col1 = v.cat and v.type = 'cde' ) or
        (a.col1 = v.test and v.type = 'def' );

然而，Hive可能会拒绝这一点。

我不确定Hive是否实现了CTE。如果是这样，这可能会解决您的问题：

 with v as (select * from abc_view)
 Select *
 from TableA a join
      v
      on( a.col1 = v.line
 where v.type='abc' 
 union all
 Select *
 from TableA a join
      v
     on a.col1 = v.group
     where v.type = 'bcd'
 union all
 Select *
 from TableA a join
      v
      on a.col1 = v.cat
 where v.type = 'cde'
 union all
 Select *
 from TableA a join
      v
      on a.col1 = v.test
 where v.type = 'def';

如果没有，您可能必须使用临时表。

Answer 2

试试希望它会表现良好

SELECT * FROM TableA a
JOIN abc_view v ON a.col1 IN (v.line, v.group, v.cat, v.test)
    AND v.type IN ('abc', 'bcd', 'cde', 'def')

和a.col1上的索引以及所有四个视图（v.line，v.group，v.cat，v.test）

多次查看执行

2 个答案: