我目前一直在尝试从另一个表中不存在的表中提取这些值。但是,由于连接值包含空值 - not in,not exists和left join选项似乎不起作用。
因此,有没有办法在HiveQL中应用“不大于”条件?
作为参考,这是我运行的查询,类似于不存在并且左连接..
with date_prob as
(
select distinct visit
from t1
where dt=20161124
and dt1!=orig_ts
),
ev_data as
(
select distinct visit
from t1
where dt=20161124
and visit is not null
and origts is not null
and uid is not null
),
fin_data as
(
select x.visit
from ev_data x
where x.visit not in
(
select distinct visit
from date_prob
and visit is not null
)
)
我为左连接运行的查询 -
with date_prob as
(
select distinct id
from t1
where dt1='2016-11-24'
and dt1!=orig_ts
and (datediff(dt1,orig_ts) not in ('1','-1'))
),
ev_data as
(
select distinct id
from t1
where dt1='2016-11-24'
and id is not null
)
select x.id
from ev_data x
left join date_prob y
where y.id is null
;
数据示例 -
id dt1 orig_ts
1 2016-11-24 2016-11-10
2 2016-11-24 2016-11-24
3 2016-11-24 2010-01-01
4 2016-11-24 2017-01-01
5 2016-11-24 2016-11-24
6 2016-11-24 2016-11-25
7 2016-11-23 2016-11-23
因此,从这张表中我想删除那些存在大于一天差异的Id。因此,查询应仅在ID等于2,5和6的情况下返回值。
答案 0 :(得分:0)
如果要从另一个表中不存在的表中提取这些值,则可以使用left join
并过滤where second_table_key is null
。
即使键中有NULL,这也可以工作:
--this query will return records from table a that do not exist in b
select a.id
from a left join b on a.id=b.id
where b.id is null; --only not joined
修复了你的例子。它有效:
drop table if exists t1;
create table t1 (id int,dt1 string, orig_ts string );
insert overwrite table t1
select 1 id, '2016-11-24' dt1, '2016-11-10' orig_ts union all
select 2 id, '2016-11-24' dt1, '2016-11-24' orig_ts union all
select 3 id, '2016-11-24' dt1, '2010-01-01' orig_ts union all
select 4 id, '2016-11-24' dt1, '2017-01-01' orig_ts union all
select 5 id, '2016-11-24' dt1, '2016-11-24' orig_ts union all
select 6 id, '2016-11-24' dt1, '2016-11-25' orig_ts union all
select 7 id, '2016-11-23' dt1, '2016-11-23' orig_ts;
with date_prob as
(
select distinct id
from t1
where dt1='2016-11-24'
and dt1!=orig_ts
and (datediff(dt1,orig_ts) not in ('1','-1'))
),
ev_data as
(
select distinct id
from t1
where dt1='2016-11-24'
and id is not null
)
select x.id
from ev_data x
left join date_prob y on x.id=y.id
where y.id is null
;
OK
2
5
6
Time taken: 14.166 seconds, Fetched: 3 row(s)
hive>
按预期工作