我有以下示例记录:
double
当存在重复的employee_id时,如何过滤log_id = 12345、12346(log_id仅相隔1个数字)的记录?输出应为:
log_id employee_id
12345 99999
12346 99999
12347 88888
12357 88888
答案 0 :(得分:4)
我不会使用窗口功能。我只会使用import itertools
import datetime
from dateutil.parser import parse
transactions = ['2018-12-04 13:{}0:00+00:00'.format(i) for i in range(6)] + \
['2018-12-04 14:{}0:00+00:00'.format(i) for i in range(1)] + \
['2018-12-04 15:{}0:00+00:00'.format(i) for i in range(2)]
for timestamp, grp in itertools.groupby(transactions, key=lambda x: datetime.datetime.combine(parse(x).date(), datetime.time(parse(x).hour, 0, 0, 0))):
count = list(grp)
print('{}:{}'.format(timestamp, len(count)))
:
2018-12-04 13:00:00:6
2018-12-04 14:00:00:1
2018-12-04 15:00:00:2
该查询应该能够利用exists
上的索引。
答案 1 :(得分:0)
我会尝试这样的事情:
with
x as (
select log_id, employee_id, row_number() over(order by log_id) as rn
from my_table
),
y as (
select
log_id, employee_id, rn,
lag(log_id) over(order by rn) as prev_log_id,
lead(log_id) over(order by rn) as next_log_id
from x
)
select log_id, employee_id from y
where log_id - 1 = prev_log_id or prev_log_id is null
and log_id + 1 = next_log_id or next_log_is is null
order by rn
答案 2 :(得分:0)
另一个选择:
SQL> with test (log_id, employee_id) as
2 (select 12345, 99999 from dual union all
3 select 12346, 99999 from dual union all
4 select 12344, 99999 from dual union all --> added this one
5 --
6 select 12347, 88888 from dual union all
7 select 12357, 88888 from dual
8 )
9 select log_id, employee_id
10 from test
11 where employee_id in (select employee_id
12 from test
13 group by employee_id
14 having max(log_id) - min(log_id) = count(*) - 1
15 );
LOG_ID EMPLOYEE_ID
---------- -----------
12344 99999
12346 99999
12345 99999
SQL>
答案 3 :(得分:0)
假设此表结构:
create table test (
log_id number(5) primary key,
employee_id number(5) not null
)
此示例数据:
insert into test
(select 12342, 99999 from dual union all
select 12343, 77777 from dual union all
select 12344, 99999 from dual union all
select 12345, 99999 from dual union all
select 12346, 99999 from dual union all
select 12347, 88888 from dual union all
-- gap
select 12357, 88888 from dual union all
select 12358, 33333 from dual union all
select 12359, 33333 from dual
)
您可以通过以下查询来做到这一点:
with x as (
select log_id,
employee_id,
lead(log_id) over (order by log_id) as next_log_id,
lag(log_id) over (order by log_id) as previous_log_id,
lead(employee_id) over (order by log_id) as next_employee_id,
lag(employee_id) over (order by log_id) as previous_employee_id
from test
)
select log_id, employee_id
from x
where (log_id = next_log_id - 1 and employee_id = next_employee_id)
or (log_id = previous_log_id + 1 and employee_id = previous_employee_id)
order by 1
结果如下:
LOG_ID | EMPLOYEE_ID -------+------------ 12344 | 99999 12345 | 99999 12346 | 99999 12358 | 33333 12359 | 33333
如果可以保证LOG_ID
的值序列没有间隙(因为样本的范围是12342到12347),则可以使用更简单的变体:
with x as (
select log_id,
employee_id,
lead(employee_id) over (order by log_id) as next_employee_id,
lag(employee_id) over (order by log_id) as previous_employee_id
from test
where log_id between 12342 and 12347
)
select log_id, employee_id
from x
where employee_id in (previous_employee_id, next_employee_id)
order by 1
您可以在this Oracle LiveSQL或此SQL Fiddle上看到它的运行情况。