最有效的方法是同时计算具有多个标准的匹配行数

时间:2016-03-15 23:44:01

标签: sql postgresql

我有一个非常大的表(称为device_operation,包含5000万行),它保存产品在其生命周期中的所有操作(例如“开始”,“停止”,“重新填充”,......“和状态这些操作(行状态:已完成,失败),具有关联设备的ID(行device_id)和每个操作的时间戳(行create_date)。

这样的事情:

/------+-----------+------------------+---------\
|   ID | Device ID | Create_Date      |  Status |
+------+-----------+------------------+---------+
|    1 |         1 | 2012-03-04 01:43 | Success |
|    2 |         4 | 2012-04-04 02:34 |  Failed |
|    3 |         9 | 2013-01-01 01:23 |  Failed |
|    4 |         4 | 2013-12-12 12:34 | Success |
|    5 |        23 | 2014-02-01 03:45 | Success |
|    6 |         1 | 2014-05-03 08:34 |  Failed |
\------+-----------+------------------+---------/

我还有另一个表(称为订阅),它告诉我产品的保修何时开始(行create_date)(行device_id)。保修期为一年。

/-----------+------------------\
| Device ID |      Create_Date |
+-----------+------------------+
|         2 | 2011-04-03 05:00 |
|         4 | 2012-03-05 03:45 |
|         5 | 2012-03-05 06:07 |
|       ... |              ... |
\-----------+------------------/

我正在使用PostgreSQL。

我想做以下事情:

  • 列出在给定日期(2014-07-06)之前至少有一次成功操作的所有设备ID

对于每个设备,请计数:

  • 该日期之后的失败操作次数+ 2天(2014-07-08),并且在尝试操作时设备处于保修期内
  • 该日期之后的失败操作次数+ 2天(2014-07-08),并且在尝试操作时设备超出保修范围
  • 该日期之后成功运营的数量(设备是否在保修期内)

我在以下方面取得了一些有限的成功(为了便于阅读,查询已经简化了一点 - 还有其他联接来获取订阅表,以及其他标准包括列表中的设备):

SELECT distinct device_operation.device_id as did, subscription.create_date,
(
    SELECT COUNT(*)
    FROM device_operation dop
    WHERE dop.device_id = device_operation.device_id and
    dop.create_date > '2014-07-08' and
    dop.status = 'Success'
) as success,
(
    SELECT COUNT(*)
    FROM device_operation dop2
    WHERE
    dop2.device_id = subscription.device_id and
    dop2.create_date > '2014-07-08' and
    dop2.status = 'Failed' and
    dop2.create_date <= subscription.create_date + interval '1 year'
) as failed_during_warranty,
(
    SELECT COUNT(*)
    FROM device_operation dop2
    WHERE
    dop2.device_id = subscription.device_id and
    dop2.create_date > '2014-07-08' and
    dop2.status = 'Failed' and
    dop2.create_date > subscription.create_date + interval '1 year'
) as failed_after_warranty,
FROM device_operation, subscription
WHERE
device_operation.status = 'Success' and -- list operations which are successful
device_operation.create_date <= '2014-07-06' and -- list operations before that date
device_operation.device_id = subscription.device_id -- get warranty start for each operation
ORDER BY success DESC, failed_during_warranty DESC, failed_after_warranty DESC

你可以猜到,它太慢我无法运行查询。但是它可以让您了解结构。

我曾尝试使用NULLIF将请求合并为一个,希望它能使PostgreSQL只列出子查询一次而不是3,但它返回“子查询必须只返回一列”:

SELECT distinct device_operation.device_id as did, subscription.create_date,
(
SELECT COUNT(NULLIF(dop2.status != 'Success', true)) as completed, 
    COUNT(NULLIF(dop2.status != 'Failed' or not (dop2.create_date <= subscription.create_date + interval '1 year'), true)) as failed_in_warranty, 
    COUNT(NULLIF(dop2.status != 'Failed' or     (dop2.create_date <= subscription.create_date + interval '1 year'), true)) as failed_after_warranty
FROM device_operation dop2
WHERE
    dop2.device_id = device_operation.device_id and
    dop2.device_id = subscription.device_id and
    dop2.create_date > '2014-07-08'
) as subq
FROM device_operation, subscription
WHERE
device_operation.status = 'Success' and -- list operations which are successful
device_operation.create_date <= '2014-07-06' and -- list operations before that date
device_operation.device_id = subscription.device_id -- get warranty start for each operation
ORDER BY success DESC, failed_in_warranty DESC, failed_outside_warranty DESC

我还尝试将子查询移动到FROM子句,但是这不起作用,因为我需要为主查询的每一行运行子查询(或者我?可能有更好的方法)

我的期望是这样的:

/-----------+---------+------------------------+-----------------------\
| Device ID | Success | Failed during warranty | Failed after warranty |
+-----------+---------+------------------------+-----------------------+
|    194853 |      10 |                      0 |                     0 |
|      7853 |       5 |                      5 |                     0 |
|      5848 |       3 |                      0 |                    56 |
|   8546455 |       0 |                     45 |                     0 |
|       102 |       0 |                      4 |                     1 |
|  69329548 |       0 |                      0 |                     9 |
|        17 |       0 |                      0 |                     0 |
\-----------+---------+------------------------+-----------------------+

有人可以帮我找到最有效的方法吗?

编辑:极限案例:您可以认为所有设备都有订阅条目。

非常感谢!

1 个答案:

答案 0 :(得分:0)

我认为你只需要条件聚合。我发现数据结构和逻辑有点难以理解,但我认为以下基本上是你需要的:

SELECT d.device_id,
       SUM(CASE WHEN d.status = 'Failed' AND d.create_date <= '2014-07-06' + interval '2 day'
                THEN 1 ELSE 0
           END) as NumFails,
       SUM(CASE WHEN d.status = 'Failed' AND d.create_date <= '2014-07-06' + interval '2 day' AND
                     d.create_date > s.create_date + interval '1 year'
                THEN 1 ELSE 0
           END) as NumFailsNoWarranty,
       SUM(CASE WHEN d.status = 'Success' AND d.create_date <= '2014-07-06' + interval '2 day'
                THEN 1 ELSE 0
           END) as NumSuccesses
FROM device_operation d JOIN
     subscription s
     ON d.device_id = s.device_id
GROUP BY d.device_id
HAVING SUM(CASE WHEN d.status = 'Success' AND d.create_date <= '2014-07-06' THEN 1 ELSE 0 END) > 0;