在Postgres中切片数据集的SQL查询

时间:2017-11-25 07:44:34

标签: sql postgresql

需要一些PostgreSQL查询帮助/指针。我有一个包含系统事件的表

活动:

for params in clf.cv_results_['params']:
        print(params)
        clf= SVC(**params)
        clf.fit(X_train, y_train)
        z_test=clf.predict(X_test)
        print("Probability 1:", prob_error(y_test,z_test))
        print("Probability 2:", average_error(y_test,z_test))

从这张表中,我可以得出行之间的延迟

id|          tstamp   |device_type
-----------------------------------
1 |2017-10-01 00:00:00|mobile
2 |2017-10-01 00:00:10|mobile
3 |2017-10-01 00:00:20|mobile
4 |2017-10-01 00:10:10|mobile
5 |2017-10-01 00:10:20|mobile

现在,我想根据延迟切换此表,如果它大于60并在切片中获得最小和最大tstamp。在这种情况下,我试图导出以下输出

    id|    tstamp     |device_type|lag_in_sec
-----------------------------------
1 |2017-10-01 00:00:00|mobile     |
2 |2017-10-01 00:00:10|mobile     | 10
3 |2017-10-01 00:00:20|mobile     | 10
4 |2017-10-01 00:10:20|mobile     | 600
5 |2017-10-01 00:10:30|mobile     | 10

非常感谢任何实现这一目标的指示。

谢谢!

1 个答案:

答案 0 :(得分:1)

我们的想法是使用CASE表达式,子查询和SUM over分析函数将记录分成基于lag_in_sec值的组,这样:

SELECT tstamp, 
       SUM( CASE WHEN lag_in_sec >= interval '60' second THEN 1 ELSE 0 END )
       OVER (order by tstamp) as group_number
FROM ( 
   SELECT *, tstamp - lag( tstamp ) Over (order by tstamp) as lag_in_sec
   FROM Table1234
) x;

+----------------------+---------------+
|        tstamp        | ,group_number |
+----------------------+---------------+
| 2017-10-01 00:00:00, |             0 |
| 2017-10-01 00:00:10, |             0 |
| 2017-10-01 00:00:20, |             0 |
| 2017-10-01 00:10:10, |             1 |
| 2017-10-01 00:10:20, |             1 |
+----------------------+---------------+

ad然后使用MIN和MAX函数对上述查询的结果执行简单的GROUP-BY查询:

SELECT min( tstamp ) as min_tstamp,
       max( tstamp ) as max_tstamp
FROM (
   SELECT tstamp, 
          SUM( CASE WHEN lag_in_sec >= interval '60' second THEN 1 ELSE 0 END )
          OVER (order by tstamp) as group_number
   FROM ( 
      SELECT *, tstamp - lag( tstamp ) Over (order by tstamp) as lag_in_sec
      FROM Table1234
   ) x
) y
GROUP BY group_number
ORDER BY 1

+----------------------+---------------------+
|     min_tstamp,      |     max_tstamp      |
+----------------------+---------------------+
| 2017-10-01 00:00:00, | 2017-10-01 00:00:20 |
| 2017-10-01 00:10:10, | 2017-10-01 00:10:20 |
+----------------------+---------------------+

现场演示:DBFiddle link