配置单元,对行进行多次计数并在同一查询中进行旋转

时间:2018-08-01 14:35:59

标签: sql hive

我们正在将日志从Oracle迁移到AWS上的Hadoop并使用Hive SQL进行查询。

日志看起来像这样

Log_table
Err_Id  System_Id, Err_time, Err_text
1       System 1   23:54      Err1 other text Err1
2       System 2   02:12      Err1 other text Err2
3       System 3   22:10      Err1
4       System 2   02:37      Err2

其中一个期望的输出按小时报告每个系统的事件数量,其中包括给定的错误代码,因此,包含Err1和Err2的错误文本都将被计入,但是同一err_id中的Err1和Err1只会被数过一次。

Err 1
System    Hour
          0  1  2  3 ...22 23 
System 1  0  0  0  0 ... 0  1  
System 2  0  0  1  0 ... 0  0  
System 3  0  0  0  0 ... 1  0  

Err 2
System    Hour
          0  1  2  3 ...22 23
System 1  0  0  0  0 ... 0  0
System 2  0  0  2  0 ... 0  0
System 3  0  0  0  0 ... 0  0

我可以创建多个查询,并分别运行或使用并集,但尽管易于编写,但效率低下。例如

select 'err1' as error_type,
       system_id,
       sum(case when hour(Err_time)='00' then 1 else 0 end) as Hour00,
       sum(case when hour(Err_time)='10' then 1 else 0 end) as Hour01,
       ...
from Log_table
where instr(Err_text,'Err1')>0
group by 'err1', system_id
union
select 'err2' as error_type,
       system_id,
       sum(case when hour(Err_time)='00' then 1 else 0 end) as Hour00,
       sum(case when hour(Err_time)='10' then 1 else 0 end) as Hour01,
       ...
from Log_table
Where instr(Err_text,'Err2')>0
group by 'err2', system_id

我也可以运行一次,以错误的格式获取数据结果,然后在外部重新旋转。例如

select system_id,
       hour(Err_time) as Err_hour,
       sum(case when instr(Err_text,'Err1')>0 then 1 else 0 end) as Err1,
       sum(case when instr(Err_text,'Err2')>0 then 1 else 0 end) as Err2,
       sum(case when instr(Err_text,'Err3')>0 then 1 else 0 end) as Err3
from Log_table
group by system_id,
         hour(Err_time)

我一次错过了一种简洁有效的方法吗?

1 个答案:

答案 0 :(得分:0)

此版本效率不高,但更为简洁:

select e.error_type,
       l.system_id,
       sum(case when hour(l.Err_time) = '00' then 1 else 0 end) as Hour00,
       sum(case when hour(l.Err_time) = '10' then 1 else 0 end) as Hour01,
       ...
from Log_table l join
     (select 'err1', 1 as ord as error_type union all
      select 'err2', 2 as ord as error_type
     ) e
     on instr(l.Err_text, e.error_type) > 0
group by e.error_type, system_id
order by ord, system_id;

您还可以将from的短语设置为:

from ((select 'err1' as error_type, l.*
       from Log_table l 
       where instr(l.Err_text, 'err1') > 0
      ) union all
      (select 'err2' as error_type, l.*
       from Log_table l 
       where instr(l.Err_text, 'err2') > 0
      ) 
     ) l

也就是说,先进行union all,然后再进行一次汇总。