我们正在将日志从Oracle迁移到AWS上的Hadoop并使用Hive SQL进行查询。
日志看起来像这样
Log_table
Err_Id System_Id, Err_time, Err_text
1 System 1 23:54 Err1 other text Err1
2 System 2 02:12 Err1 other text Err2
3 System 3 22:10 Err1
4 System 2 02:37 Err2
其中一个期望的输出按小时报告每个系统的事件数量,其中包括给定的错误代码,因此,包含Err1和Err2的错误文本都将被计入,但是同一err_id中的Err1和Err1只会被数过一次。
Err 1
System Hour
0 1 2 3 ...22 23
System 1 0 0 0 0 ... 0 1
System 2 0 0 1 0 ... 0 0
System 3 0 0 0 0 ... 1 0
Err 2
System Hour
0 1 2 3 ...22 23
System 1 0 0 0 0 ... 0 0
System 2 0 0 2 0 ... 0 0
System 3 0 0 0 0 ... 0 0
我可以创建多个查询,并分别运行或使用并集,但尽管易于编写,但效率低下。例如
select 'err1' as error_type,
system_id,
sum(case when hour(Err_time)='00' then 1 else 0 end) as Hour00,
sum(case when hour(Err_time)='10' then 1 else 0 end) as Hour01,
...
from Log_table
where instr(Err_text,'Err1')>0
group by 'err1', system_id
union
select 'err2' as error_type,
system_id,
sum(case when hour(Err_time)='00' then 1 else 0 end) as Hour00,
sum(case when hour(Err_time)='10' then 1 else 0 end) as Hour01,
...
from Log_table
Where instr(Err_text,'Err2')>0
group by 'err2', system_id
我也可以运行一次,以错误的格式获取数据结果,然后在外部重新旋转。例如
select system_id,
hour(Err_time) as Err_hour,
sum(case when instr(Err_text,'Err1')>0 then 1 else 0 end) as Err1,
sum(case when instr(Err_text,'Err2')>0 then 1 else 0 end) as Err2,
sum(case when instr(Err_text,'Err3')>0 then 1 else 0 end) as Err3
from Log_table
group by system_id,
hour(Err_time)
我一次错过了一种简洁有效的方法吗?
答案 0 :(得分:0)
此版本效率不高,但更为简洁:
select e.error_type,
l.system_id,
sum(case when hour(l.Err_time) = '00' then 1 else 0 end) as Hour00,
sum(case when hour(l.Err_time) = '10' then 1 else 0 end) as Hour01,
...
from Log_table l join
(select 'err1', 1 as ord as error_type union all
select 'err2', 2 as ord as error_type
) e
on instr(l.Err_text, e.error_type) > 0
group by e.error_type, system_id
order by ord, system_id;
您还可以将from
的短语设置为:
from ((select 'err1' as error_type, l.*
from Log_table l
where instr(l.Err_text, 'err1') > 0
) union all
(select 'err2' as error_type, l.*
from Log_table l
where instr(l.Err_text, 'err2') > 0
)
) l
也就是说,先进行union all
,然后再进行一次汇总。