Question

我在Hive中有一个分区表，分区列为e_year，e_month和e_day（全部为字符串）。如果我对这些值进行硬编码，那么查询就会很好，但是如果我试图让它更通用，它就会卡住并超时。你能告诉我这个问题是什么吗？该表的大小为5-6TB

查询：

select count(*),
       e_type,
       e_src_type
from   table1
where e_year=
      cast(substr(date_sub(From_unixtime(unix_timestamp()), 4),1,4) as string)
and   e_month=
      cast(substr(date_sub(From_unixtime(unix_timestamp()), 4),6,2) as string)
and  e_day=
     cast(substr(date_sub(From_unixtime(unix_timestamp()), 4),9,2) as string)
group by e_type,
         e_src_type

出来了：

select  count(*),
        e_type, 
        e_src_type
from    table1
where   e_year='2015'
and     e_month='02'
and     e_day='02'
group by e_type,
        e_src_type

Answer 1

您需要预先评估分区并在查询中使用配置单元变量，如下所示。

SET year='2015';
SET month='02';
SET day='02';

select  count(*),
        e_type, 
        e_src_type
from    table1
where   e_year==${hiveconf:year}
and     e_month=${hiveconf:month}
and     e_day=${hiveconf:day}
group by e_type,
        e_src_type;

评估变量有点棘手，类似的情况下你可以找到here。 GL！

查询问题与Hive

1 个答案: