Question

我有一个大的oracle（Oracle Database 12c企业版第12.1.0.2.0版）表table_name，每隔15秒更新一次。它有很多列，但我关心的是：

branch_name

我要做的是：

1）获取UTC_TIMESTAMP＆lt; = current_date和UTC_TIMESTAMP＆gt;的所有记录。 current_date - 5分钟（大约125K-150K）

2）此数据将具有重复的ID_1。所以我想只记录每个ID_1在其重复项中有最大值（UTC_TIMESTAMP）的记录。所以现在我们将拥有不同的ID_1。

我尝试了什么：使用以下SQL

Name            Null?    Type                              
--------------- -------- --------------------------------- 
ID_1            NOT NULL NUMBER(38)                        
UTC_TIMESTAMP   NOT NULL TIMESTAMP(6) WITH TIME ZONE       
ID_2                     VARCHAR2(8)                       
SERVER_NAME              VARCHAR2(256)                     
ID_3                     NUMBER(38)                        
COUNT_1                  NUMBER(38)                        
COUNT_2                  NUMBER(38)

问题：我只能获得ID_2，ID_1和UTC_TIMESTAMP，但我也想要所有其他列。可以使用SQL吗？

在5分钟的窗口中有大约2200个不同的ID_1和大约125K-150K的记录。这样做是通过复制Excel工作表中的125K-150K记录并对2200 ID_1中的每一个进行过滤来查找每个ID_1的最大UTC_TIMESTAMP，这是不切实际的。但是，如果有任何使用宏的快速方法，我也可以这样做。

示例虚拟数据：

with temp_1 as (
select m.ID_2, m.ID_1, max(utc_timestamp) max_utc_timestamp
   from commsdesk.table_name m
   where m.ID_2 = 'TWC'
   group by m.ID_2, m.ID_1)
select f.utc_timestamp
  from commsdesk.table_name f
  join temp_1 t
    on t.max_utc_timestamp = f.utc_timestamp
   and t.ID_2 = f.ID_2
   and t.ID_1 = f.ID_1;

预期产出：

ID_2    SERVER_NAME     ID_3    ID_1     UTC_TIMESTAMP               COUNT_1    COUNT_2
ABC     PQRS.ABC.TPO    2       303      24-JUL-17 03.41.55.000000000 PM +00:00 4   0
ABC     PQRS.ABC.TPO    2      1461      24-JUL-17 03.42.48.000000000 PM +00:00 1   7
ABC     PQRS.ABC.TPO    2         1      24-JUL-17 03.41.36.000000000 PM +00:00 2   3
ABC    PQRS.ABC.TPO     2      1461      24-JUL-17 03.41.16.000000000 PM +00:00 0   8
ABC    PQRS.ABC.TPO     1         1      24-JUL-17 03.41.11.000000000 PM +00:00 5   0
ABC    SRP.ROP.MTP      1         1      24-JUL-17 03.41.23.000000000 PM +00:00 0   0
ABC    SRP.ROP.MTP      2       303      24-JUL-17 03.41.34.000000000 PM +00:00 0   0
ABC    SRP.ROP.MTP      2      1461      24-JUL-17 03.41.31.000000000 PM +00:00 0   0
ABC    SRP.ROP.MTP      4       303      24-JUL-17 03.41.26.000000000 PM +00:00 4   8
ABC    SRP.ROP.MTP      2       303      24-JUL-17 03.41.20.000000000 PM +00:00 0   0
ABC    SRP.ROP.MTP      1      1461      24-JUL-17 03.41.01.000000000 PM +00:00 3   8
ABC    SRP.ROP.MTP      4         1      24-JUL-17 03.41.18.000000000 PM +00:00 9   1

Answer 1

您可以使用max()汇总功能的the keep (dense_rank last ...)版本（或者，如果您愿意，可以使用first和min），例如：

select id_1,
  max(utc_timestamp),
  max(id_2) keep (dense_rank last order by utc_timestamp) as id_2,
  max(server_name) keep (dense_rank last order by utc_timestamp) as server_name,
  max(id_3) keep (dense_rank last order by utc_timestamp) as id_3,
  max(count_1) keep (dense_rank last order by utc_timestamp) as count_1,
  max(count_2) keep (dense_rank last order by utc_timestamp) as count_2
from table_name
where utc_timestamp > current_timestamp - interval '5' minute
and utc_timestamp <= current_timestamp
group by id_1
order by id_1;

查询按id_1分组，如您所希望的最新时间戳，max(utc_timestamp)为“正常”。其他列保留与id_的最大时间戳行相关联的值。

使用一些虚拟数据：

insert into table_name (id_1, utc_timestamp, id_2, server_name, id_3, count_1, count_2)
values (1, systimestamp at time zone 'UTC' - interval '30' second, 'TWC', 'test1', 301, 1, 1);
insert into table_name (id_1, utc_timestamp, id_2, server_name, id_3, count_1, count_2)
values (1, systimestamp at time zone 'UTC' - interval '60' second, 'TWC', 'test2', 302, 2, 2);
insert into table_name (id_1, utc_timestamp, id_2, server_name, id_3, count_1, count_2)
values (1, systimestamp at time zone 'UTC' - interval '90' second, 'TWC', 'test3', 303, 3, 3);
insert into table_name (id_1, utc_timestamp, id_2, server_name, id_3, count_1, count_2)
values (2, systimestamp at time zone 'UTC' - interval '45' second, 'TWC', 'test4', 304, 4, 4);
insert into table_name (id_1, utc_timestamp, id_2, server_name, id_3, count_1, count_2)
values (2, systimestamp at time zone 'UTC' - interval '15' second, 'TWC', 'test5', 305, 5, 5);

该查询获得结果：

      ID_1 MAX(UTC_TIMESTAMP)          ID_2     SERVE       ID_3    COUNT_1    COUNT_2
---------- --------------------------- -------- ----- ---------- ---------- ----------
         1 2017-07-21 18:38:22.944 UTC TWC      test1        301          1          1
         2 2017-07-21 18:38:38.399 UTC TWC      test5        305          5          5

你可以通过更像你的尝试获得相同的结果：

with cte as (
  select id_1, max(utc_timestamp) max_utc_timestamp
  from table_name m
  where utc_timestamp > current_timestamp - interval '5' minute
  and utc_timestamp <= current_timestamp
  group by id_1
)
select t.id_1, t.utc_timestamp, t.id_2, t.server_name, t.id_3, t.count_1, t.count_2
from cte
join table_name t on t.id_1 = cte.id_1
and t.utc_timestamp = cte.max_utc_timestamp
order by t.id_1;

...假设id_1和utc_timestamp组合是唯一的（不确定为什么使用id_2进行连接;可能这是唯一性所必需的？）。但这样效率会降低，因为它必须两次查询真实表，一次查找每个id_1的最大时间戳，然后再次在连接中查询。可能值得运行两个版本来比较结果和时间，以及执行计划。

使用您的示例数据（在2017-07-24更新），上面的第一个查询 - 仅修改为使用固定时间戳范围匹配 - 得到：

     ID_1 MAX(UTC_TIMESTAMP)                ID_ SERVER_NAME        ID_3    COUNT_1    COUNT_2
---------- --------------------------------- --- ------------ ---------- ---------- ----------
         1 2017-07-24 15:41:36.000000 +00:00 ABC PQRS.ABC.TPO          2          2          3
       303 2017-07-24 15:41:55.000000 +00:00 ABC PQRS.ABC.TPO          2          4          0
      1461 2017-07-24 15:42:48.000000 +00:00 ABC PQRS.ABC.TPO          2          1          7

或删除您似乎不感兴趣的列：

select id_1,
  max(utc_timestamp),
  max(count_1) keep (dense_rank last order by utc_timestamp) as count_1,
  max(count_2) keep (dense_rank last order by utc_timestamp) as count_2
from table_name
where utc_timestamp > timestamp '2017-07-24 16:40:00 Europe/London' -- current_timestamp - interval '5' minute
and utc_timestamp <= timestamp '2017-07-24 16:45:00 Europe/London' -- current_timestamp
group by id_1
order by id_1;

      ID_1 MAX(UTC_TIMESTAMP)                   COUNT_1    COUNT_2
---------- --------------------------------- ---------- ----------
         1 2017-07-24 15:41:36.000000 +00:00          2          3
       303 2017-07-24 15:41:55.000000 +00:00          4          0
      1461 2017-07-24 15:42:48.000000 +00:00          1          7

然后进行下一步：

select max(max_utc_timestamp) as max_utc_timestamp,
  sum(count_1) as sum_count_1,
  sum(count_2) as sum_count_2
from (
  select max(utc_timestamp) as max_utc_timestamp,
    max(count_1) keep (dense_rank last order by utc_timestamp) as count_1,
    max(count_2) keep (dense_rank last order by utc_timestamp) as count_2
  from table_name
  where utc_timestamp > timestamp '2017-07-24 16:40:00 Europe/London' -- current_timestamp - interval '5' minute
  and utc_timestamp <= timestamp '2017-07-24 16:45:00 Europe/London' -- current_timestamp
  group by id_1
);

MAX_UTC_TIMESTAMP                 SUM_COUNT_1 SUM_COUNT_2
--------------------------------- ----------- -----------
2017-07-24 15:42:48.000000 +00:00           7          10

如何通过查找每个列的最大时间戳来获取列的不同值，然后获取其余列

1 个答案: