MySQL返回行的比例子集

时间:2014-12-01 22:53:14

标签: mysql sql datetime

我有一个像这样的查询

SELECT
    t.trial,
    count(*),
    MAX(d.value)
FROM data d
INNER JOIN trial_info t 
    ON d.instance_timestamp BETWEEN t.trial_start_timestamp AND t.trial_end_timestamp
GROUP BY t.trial;

返回

t.trial | count(*) | MAX(d.value)
---------------------------------
1       | 80       | 176
2       | 63       | 219
3       | 49       | 109
4       | 67       | 155

d的一行代表一秒钟,因此t.trial上的压缩意味着count(*)将返回每次试用中的秒数,MAX(d.value)将是最大的在那些秒钟内观察到的价值。

问题:如何将结果仅仅返回每个试验的中间50%和当时的MAX值?我想抛出试用时间的前25%以及最后25%。我的子查询技巧不是那么热......

到目前为止,这是我的想法。当computedValue设置为静态数字时,替换此连接有效。但是,我希望computedValue为返回行的百分比(显示在上面的count(*)结果列中)。

INNER JOIN trial_info t ON d.instance_timestamp BETWEEN
    DATE_ADD(t.trial_start_timestamp, INTERVAL computedValue SECOND) AND
    DATE_SUB(t.trial_end_timestamp, INTERVAL computedValue SECOND)

1 个答案:

答案 0 :(得分:0)

使用TIMESTAMPDIFFt.trial_start_timestampt.trial_end_timestamp获取0.25 *的间隔秒,并使用FLOOR将其四舍五入为整数:

FLOOR(TIMESTAMPDIFF(SECOND,t.trial_start_timestamp,t.trial_end_timestamp)*0.25)

所以连接条件是:

INNER JOIN trial_info t ON d.instance_timestamp BETWEEN
    DATE_ADD(t.trial_start_timestamp, INTERVAL FLOOR(TIMESTAMPDIFF(SECOND,t.trial_start_timestamp,t.trial_end_timestamp)*0.25) SECOND) AND
    DATE_SUB(t.trial_end_timestamp, INTERVAL FLOOR(TIMESTAMPDIFF(SECOND,t.trial_start_timestamp,t.trial_end_timestamp)*0.25) SECOND)

编辑:

如果你的50%是基于试验次数而不是时间戳,解决方案是在子查询中使用LIMIT和OFFSET来获得d.instance_timestamp的试验次数的25%到75%:

d.instance_timestamp BETWEEN (SELECT d1.instance_timestamp FROM data d1
where d1.instance_timestamp BETWEEN t.trial_start_timestamp AND t.trial_end_timestamp ORDER BY d1.instance_timestamp ASC LIMIT 1 OFFSET (SELECT FLOOR(0.25 * COUNT(*)) FROM data d2 where d2.instance_timestamp BETWEEN t.trial_start_timestamp AND t.trial_end_timestamp)) 
AND 
(SELECT d3.instance_timestamp FROM data d3
where d3.instance_timestamp BETWEEN t.trial_start_timestamp AND t.trial_end_timestamp ORDER BY d3.instance_timestamp ASC LIMIT 1 OFFSET (SELECT FLOOR(0.75 * COUNT(*)) FROM data d4 where d4.instance_timestamp BETWEEN t.trial_start_timestamp AND t.trial_end_timestamp))

但是,MySQL不支持子查询作为LIMIT和OFFSET的值。要在MySQL中执行此操作,您需要使用SUBSTRING_INDEXGROUP_CONCAT计算第25和第75百分位数的instance_timestamp:

    d.instance_timestamp BETWEEN 
    (SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(d1.instance_timestamp ORDER BY d1.instance_timestamp SEPARATOR ','), ',' , 0.25 * COUNT(*) + 1), ',', -1) FROM  data d1
where d1.instance_timestamp BETWEEN t.trial_start_timestamp AND t.trial_end_timestamp) 
    AND 
    (SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(d2.instance_timestamp ORDER BY d2.instance_timestamp SEPARATOR ','), ',' , 0.75 * COUNT(*) + 1), ',', -1) FROM  data d2
where d2.instance_timestamp BETWEEN t.trial_start_timestamp AND t.trial_end_timestamp)