Postgres:计算匹配条件的表条目的比率

时间:2017-11-02 16:18:32

标签: database postgresql

我在PostgreSQL数据库中有以下两个表:

dummy=# select * from employee;
 id | name  
----+-------
  1 | John
  2 | Susan
  3 | Jim
  4 | Sarah
(4 rows)

dummy=# select * from stats;
 id | arrival  |    day     | employee_id 
----+----------+------------+-------------
  2 | 08:31:34 | monday     |           2
  4 | 08:15:00 | monday     |           3
  5 | 08:43:00 | monday     |           4
  1 | 08:34:00 | monday     |           1
  7 | 08:29:00 | midweek    |           1
  8 | 08:31:00 | midweek    |           2
  9 | 08:10:00 | midweek    |           3
 10 | 08:40:00 | midweek    |           4
 11 | 08:28:00 | midweek    |           1
 12 | 08:33:00 | midweek    |           2
 14 | 08:21:00 | midweek    |           3
 15 | 08:45:00 | midweek    |           4
 16 | 08:25:00 | midweek    |           1
 17 | 08:35:00 | midweek    |           2
 18 | 08:44:00 | midweek    |           4
 19 | 08:10:00 | friday     |           1
 20 | 08:40:00 | friday     |           2
 21 | 08:30:00 | friday     |           3
 22 | 08:30:00 | friday     |           4
(19 rows)

我想选择在8:258:35之间midweekfriday之间到达的所有员工。我可以通过以下查询完成相对简单的操作:

SELECT * FROM stats
WHERE
    arrival >= (time '8:30' - interval '5 minutes')
AND
    arrival <= (time '8:30' + interval '5 minutes')
AND
    (day = 'midweek' or day = 'friday');

然而,另一个标准是我只想选择那些在上述时间窗口内至少有60%时间到达的员工。这是我被困的地方。我不知道如何计算这个比例。

查询符合所有条件的内容是什么?

澄清

显然上述比率的描述具有误导性。 在计算比率时,只应考虑符合标准(day = 'midweek' or day = 'friday')的行。因此,在示例数据中,John和Susan在midweekfriday上出现了四次工作。这四次中有三次是准时的。因此,苏珊和约翰的比例为75%

2 个答案:

答案 0 :(得分:1)

使用公用表表达式计算所需的计数,例如

tag<func, tag2, tag3>::value == 0

结果:

with in_time as (
    select * 
    from stats
    where arrival >= (time '8:30' - interval '5 minutes')
    and arrival <= (time '8:30' + interval '5 minutes')
    and (day = 'midweek' or day = 'friday')
),
count_in_time as (
    select employee_id, count(*)
    from in_time
    group by employee_id
),
total_count as (
    select employee_id, count(*)
    from stats
    where day = 'midweek' or day = 'friday'
    group by employee_id
)
select 
    i.*, 
    c.count as in_time, 
    t.count as total_count, 
    round(c.count* 100.0/t.count, 2) as ratio
from in_time i
join count_in_time c using(employee_id) 
join total_count t using(employee_id);

您可以在最终查询的WHERE子句中添加适当的条件。

如果您只希望获得员工及其比率的汇总数据,请将count()与过滤器一起使用:

 id | arrival  |   day   | employee_id | in_time | total_count | ratio 
----+----------+---------+-------------+---------+-------------+-------
 16 | 08:25:00 | midweek |           1 |       3 |           4 | 75.00
 11 | 08:28:00 | midweek |           1 |       3 |           4 | 75.00
  7 | 08:29:00 | midweek |           1 |       3 |           4 | 75.00
 17 | 08:35:00 | midweek |           2 |       3 |           4 | 75.00
 12 | 08:33:00 | midweek |           2 |       3 |           4 | 75.00
  8 | 08:31:00 | midweek |           2 |       3 |           4 | 75.00
 21 | 08:30:00 | friday  |           3 |       1 |           3 | 33.33
 22 | 08:30:00 | friday  |           4 |       1 |           4 | 25.00
(8 rows)

答案 1 :(得分:0)

您可以像这样获得到达率,例如:

SELECT name, 
AVG(CASE WHEN arrival >= (time '8:30' - interval '5 minutes') AND
        arrival <= (time '8:30' + interval '5 minutes') THEN 1 ELSE 0 END) AS arrival_rate
    FROM employee
    INNER JOIN stats ON stats.employee_id = employee.id
    GROUP BY name

并且只选择那些费率&gt; 60%你只是使用条件

    SELECT name, 
    AVG(CASE WHEN arrival >= (time '8:30' - interval '5 minutes') AND
            arrival <= (time '8:30' + interval '5 minutes') THEN 1 ELSE 0 END) AS arrival_rate
        FROM employee
        INNER JOIN stats ON stats.employee_id = employee.id
        GROUP BY name

        HAVING 

 AVG(CASE WHEN arrival >= (time '8:30' - interval '5 minutes') AND
            arrival <= (time '8:30' + interval '5 minutes') THEN 1 ELSE 0 END)


         > 0.6