从表数据计算

时间:2016-06-01 16:03:02

标签: sql oracle

我正在尝试从表格中的数据计算一些计数和平均时间间隔:

+---------+--------+------+-----------+----------+
| STATS_I | TASK_I | TYPE | CREATE_TS | END_TS   |
+---------+--------+------+-----------+----------+
| 1       | 111    | A    | 30-05-16  | 30-05-16 |
+---------+--------+------+-----------+----------+
| 2       | 111    | A    | 30-05-16  |          |
+---------+--------+------+-----------+----------+
| 3       | 111    | B    | 30-05-16  | 30-05-16 |
+---------+--------+------+-----------+----------+
| 4       | 111    | B    | 30-05-16  |          |
+---------+--------+------+-----------+----------+
| 5       | 111    | C    | 30-05-16  | 30-05-16 |
+---------+--------+------+-----------+----------+
| 6       | 222    | D    | 30-05-16  |          |
+---------+--------+------+-----------+----------+
| 7       | 222    | D    | 30-05-16  | 30-05-16 |
+---------+--------+------+-----------+----------+
| 8       | 222    | C    | 30-05-16  | 30-05-16 |
+---------+--------+------+-----------+----------+
| 9       | 222    | C    | 30-05-16  | 30-05-16 |
+---------+--------+------+-----------+----------+
| 10      | 222    | C    | 30-05-16  |          |
+---------+--------+------+-----------+----------+
| 11      | 333    | A    | 30-05-16  | 30-05-16 |
+---------+--------+------+-----------+----------+
| 12      | 333    | B    | 30-05-16  | 30-05-16 |
+---------+--------+------+-----------+----------+
| 13      | 333    | B    | 30-05-16  | 31-05-16 |
+---------+--------+------+-----------+----------+
| 14      | 333    | D    | 30-05-16  | 30-05-16 |
+---------+--------+------+-----------+----------+
| 15      | 333    | D    | 30-05-16  | 31-05-16 |
+---------+--------+------+-----------+----------+
| 16      | 444    | D    | 30-05-16  | 30-05-16 |
+---------+--------+------+-----------+----------+
| 17      | 444    | D    | 30-05-16  | 31-05-16 |
+---------+--------+------+-----------+----------+
| 18      | 444    | C    | 30-05-16  | 30-05-16 |
+---------+--------+------+-----------+----------+
| 19      | 444    | B    | 30-05-16  | 30-05-16 |
+---------+--------+------+-----------+----------+
| 20      | 444    | A    | 30-05-16  | 30-05-16 |
+---------+--------+------+-----------+----------+

样本表可以填充:

CREATE TABLE "STATS" ("STATS_I" NUMBER(10,0), "TASK_I" NUMBER(10,0), "TYPE" VARCHAR2(30), "CREATE_TS" DATE, "END_TS" DATE); 
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (1,111,'A',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (2,111,'A',to_date('30-05-16','DD-MM-RR'),null);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (3,111,'B',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (4,111,'B',to_date('30-05-16','DD-MM-RR'),null);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (5,111,'C',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (6,222,'D',to_date('30-05-16','DD-MM-RR'),null);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (7,222,'D',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (8,222,'C',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (9,222,'C',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (10,222,'C',to_date('30-05-16','DD-MM-RR'),null);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (11,333,'A',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (12,333,'B',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (13,333,'B',to_date('30-05-16','DD-MM-RR'),null);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (14,333,'D',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (15,333,'D',to_date('30-05-16','DD-MM-RR'),null);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (16,444,'D',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (17,444,'D',to_date('30-05-16','DD-MM-RR'),null);
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (18,444,'C',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (19,444,'B',to_date('30-05-16','DD-MM-RR'),to_date('30-05-16','DD-MM-RR'));
Insert into STATS (STATS_I,TASK_I,TYPE,CREATE_TS,END_TS) values (20,444,'A',to_date('30-05-16','DD-MM-RR'),to_date('31-05-16','DD-MM-RR'));

- end_tscreate_ts保持10分钟差异

我想要获得的输出是:

+--------+--------------+-------------------+-------------------+-------+
| Task_i | A            | B                 | C                 | D     |
+        +--------------+-------------------+-------------------+-------+
|        | Count | Read | Count | Avg. Time | Count | Avg. Time | Count |
|        |       |      |       |   to Read |       |   to Read |       |
+--------+-------+------+-------+-----------+-------+-----------+-------+
| 111    | 2     | 1    | 2     | 10 min    | 1     | 10 min    | 0     |
+--------+-------+------+-------+-----------+-------+-----------+-------+
| 222    | 0     | 0    | 0     | 0 min     | 3     | 10 min    | 2     |
+--------+-------+------+-------+-----------+-------+-----------+-------+
| 333    | 1     | 1    | 2     | 10 min    | 0     | 0 min     | 2     |
+--------+-------+------+-------+-----------+-------+-----------+-------+
| 444    | 1     | 1    | 1     | 10 min    | 1     | 10 min    | 2     |
+--------+-------+------+-------+-----------+-------+-----------+-------+

每个task_i的位置:

  • '计数'是该类型的行数;
  • '读'是end_ts不为空的类型的行数
  • '平均阅读时间'由(end_ts - create_ts )/count of type计算,忽略end_ts为空的行

到目前为止,我尝试创建了四个表,每个表对应一种类型,然后在task_i上加入它们:

(((select count(*) as A from stats s   where s.type='A' group by s.TYPE  UNION ALL
select count(*) as B from stats s   where s.type='B' group by s.TYPE ) UNION ALL
select count(*) as C from stats s   where s.type='C' group by s.TYPE ) UNION ALL
select count(*) as D from stats s   where s.type='D' group by s.TYPE ) ;

但这并不能产生我需要的东西:

         A
----------
         4
         5
         4
         7

我做错了什么,如何生成所需的输出?

1 个答案:

答案 0 :(得分:0)

计数是您尝试的部分,可以通过条件计数找到 - 在案例表达式中 - 而不是使用union:

select task_i,
  count(case when type = 'A' then 1 end) as a_count,
  count(case when type = 'B' then 1 end) as b_count,
  count(case when type = 'C' then 1 end) as c_count,
  count(case when type = 'D' then 1 end) as d_count
from stats
group by task_i
order by task_i;

    TASK_I    A_COUNT    B_COUNT    C_COUNT    D_COUNT
---------- ---------- ---------- ---------- ----------
       111          2          2          1          0
       222          0          0          2          3
       333          1          2          0          2
       444          1          1          1          2

通过修改案例表达式可以找到'读取'的数量,其中end_ts is not null

select task_i,
  count(case when type = 'A' then 1 end) as a_count,
  count(case when type = 'A' and end_ts is not null then 1 end) as a_read,
  count(case when type = 'B' then 1 end) as b_count,
  count(case when type = 'B' and end_ts is not null then 1 end) as b_read,
  count(case when type = 'C' then 1 end) as c_count,
  count(case when type = 'C' and end_ts is not null then 1 end) as c_read,
  count(case when type = 'D' then 1 end) as d_count
from stats
group by task_i
order by task_i;

    TASK_I    A_COUNT     A_READ    B_COUNT     B_READ    C_COUNT     C_READ    D_COUNT
---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
       111          2          1          2          1          1          1          0
       222          0          0          0          0          2          1          3
       333          1          1          2          1          0          0          2
       444          1          1          1          1          1          1          2

可以通过相同的方式找到过去的时间,但使用avg()代替count()

select task_i,
  count(case when type = 'A' then 1 end) as a_count,
  count(case when type = 'A' and end_ts is not null then 1 end) as a_read,
  avg(case when type = 'A' and end_ts is not null then end_ts - create_ts end) as a_readtime,
  count(case when type = 'B' then 1 end) as b_count,
  count(case when type = 'B' and end_ts is not null then 1 end) as b_read,
  avg(case when type = 'B' and end_ts is not null then end_ts - create_ts end) as b_readtime,
  count(case when type = 'C' then 1 end) as c_count,
  count(case when type = 'C' and end_ts is not null then 1 end) as c_read,
  avg(case when type = 'C' and end_ts is not null then end_ts - create_ts end) as c_readtime,
  count(case when type = 'D' then 1 end) as d_count
from stats
group by task_i
order by task_i;

    TASK_I    A_COUNT     A_READ A_READTIME    B_COUNT     B_READ B_READTIME    C_COUNT     C_READ C_READTIME    D_COUNT
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
       111          2          1     1.0035          2          1     1.0035          1          1     1.0035          0
       222          0          0                     0          0                     2          1     1.0035          3
       333          1          1     1.0035          2          1     1.0035          0          0                     2
       444          1          1     1.0035          1          1     1.0035          1          1     1.0035          2

日期算术意味着那些以一天的分数显示;我稍微修改了您的数据,每次结束时间增加五分钟,以便更接近原始结果。要获得字符串输出,您需要稍微操作一下,如果使用内联视图进行计算,那么重复性要小一些:

select task_i,
  a_count,
  a_read,
  case when a_readtime > 1 then trunc(24 * a_readtime) || ' hrs ' end
    || round(60 * mod(24 * nvl(a_readtime, 0), 1)) || ' min'
    as a_readtime
from (
  select task_i,
    count(case when type = 'A' then 1 end) as a_count,
    count(case when type = 'A' and end_ts is not null then 1 end) as a_read,
    avg(case when type = 'A' and end_ts is not null then end_ts - create_ts end) as a_readtime,
    count(case when type = 'B' then 1 end) as b_count,
    count(case when type = 'B' and end_ts is not null then 1 end) as b_read,
    avg(case when type = 'B' and end_ts is not null then end_ts - create_ts end) as b_readtime,
    count(case when type = 'C' then 1 end) as c_count,
    count(case when type = 'C' and end_ts is not null then 1 end) as c_read,
    avg(case when type = 'C' and end_ts is not null then end_ts - create_ts end) as c_readtime,
    count(case when type = 'D' then 1 end) as d_count
  from stats
  group by task_i
)
order by task_i;

    TASK_I    A_COUNT     A_READ A_READTIME    
---------- ---------- ---------- ---------------
       111          2          1 24 hrs 5 min   
       222          0          0 0 min            
       333          1          1 24 hrs 5 min   
       444          1          1 24 hrs 5 min   

...并为B,C和D的相同数据重复外部查询选择列表项。

如果您使用的是11g或更高版本,则可以使用pivot来避免重复将时差转换为字符串:

select * from (
  select task_i, type, type_count, type_read,
    case when type_readtime > 1 then trunc(24 * type_readtime) || ' hrs ' end
      || round(60 * mod(24 * type_readtime, 1)) || ' min'
    as type_readtime
  from (
    select task_i,
      type,
      count(*) as type_count,
      count(case when end_ts is not null then 1 end) as type_read,
      avg(case when end_ts is not null then end_ts - create_ts end) as type_readtime
    from stats
    group by task_i, type
  )
)
pivot (max(type_count) as count, max(type_read) as read, max(type_readtime) as readtime
  for (type) in ('A' as a, 'B' as b, 'C' as c, 'D' as d))
order by task_i;

    TASK_I    A_COUNT     A_READ A_READTIME         B_COUNT     B_READ B_READTIME         C_COUNT     C_READ C_READTIME         D_COUNT     D_READ D_READTIME    
---------- ---------- ---------- --------------- ---------- ---------- --------------- ---------- ---------- --------------- ---------- ---------- ---------------
       111          2          1 24 hrs 5 min             2          1 24 hrs 5 min             1          1 24 hrs 5 min                                         
       222                                                                                      2          1 24 hrs 5 min             3          2 24 hrs 5 min   
       333          1          1 24 hrs 5 min             2          1 24 hrs 5 min                                                   2          1 24 hrs 5 min   
       444          1          1 24 hrs 5 min             1          1 24 hrs 5 min             1          1 24 hrs 5 min             2          1 24 hrs 5 min   

...但如果您不想要空值,则需要在最终选择列表中使用nvl()次调用替换它们,这会使其再次变得更加混乱。