在postgres中选择分组结果集的中间95%

时间:2015-05-06 10:08:40

标签: sql postgresql

我目前有表(下面)在测试API时记录restcalls。我需要做的是在指定的时间间隔内排除每个不同的restcallname的极端情况(上/下2.5%)。

我到目前为止最接近的是下面的代码,返回一个表格,其中排除了最高/最低2.5%的整个结果。

     Column     |            Type             |       Modifiers
----------------+-----------------------------+------------------------
 timestamp      | timestamp without time zone | not null default now()
 testrunid      | character varying(255)      |
 sessionid      | character varying(255)      |
 restcallname   | character varying(255)      |
 completiontime | integer                     |


SELECT 
    restcallname, 
    count(restcallname) as noOfRestCalls, 
    round(avg(completiontime)) as avg_CompletionTime, 
    min(completiontime) as min_CompletionTime, 
    max(completiontime) as max_CompletionTime 
FROM (
    SELECT * 
    FROM requests
    WHERE 
        timestamp > NOW() - INTERVAL '1 week' 
    ORDER BY
        completiontime
    LIMIT (SELECT (COUNT(*) * 0.95)::integer FROM requests WHERE timestamp > NOW() - INTERVAL '1 week')
    OFFSET (SELECT (COUNT(*) * 0.025)::integer FROM requests WHERE timestamp > NOW() - INTERVAL '1 week')
) x
GROUP BY 
    restcallname 
ORDER BY 
    restcallname;

有任何解决此问题或提及类似问题的建议吗?

1 个答案:

答案 0 :(得分:1)

我倾向于使用窗口函数来执行此操作:

SELECT restcallname, 
       count(restcallname) as noOfRestCalls, 
       round(avg(completiontime)) as avg_CompletionTime, 
       min(completiontime) as min_CompletionTime, 
       max(completiontime) as max_CompletionTime 
FROM (SELECT r.*,
             ROW_NUMBER() OVER (ORDER BY completiontime) as seqnum,
             COUNT(*) OVER () as cnt
      FROM requests r
      WHERE  timestamp > NOW() - INTERVAL '1 week'
     ) r
WHERE seqnum >= 0.025 * cnt AND
      seqnum <= (1 - 0.025) * cnt
GROUP BY restcallname
ORDER BY restcallname;