我有一个非常大的表(称为device_operation,包含5000万行),它保存产品在其生命周期中的所有操作(例如“开始”,“停止”,“重新填充”,......“和状态这些操作(行状态:已完成,失败),具有关联设备的ID(行device_id)和每个操作的时间戳(行create_date)。
这样的事情:
/------+-----------+------------------+---------\
| ID | Device ID | Create_Date | Status |
+------+-----------+------------------+---------+
| 1 | 1 | 2012-03-04 01:43 | Success |
| 2 | 4 | 2012-04-04 02:34 | Failed |
| 3 | 9 | 2013-01-01 01:23 | Failed |
| 4 | 4 | 2013-12-12 12:34 | Success |
| 5 | 23 | 2014-02-01 03:45 | Success |
| 6 | 1 | 2014-05-03 08:34 | Failed |
\------+-----------+------------------+---------/
我还有另一个表(称为订阅),它告诉我产品的保修何时开始(行create_date)(行device_id)。保修期为一年。
/-----------+------------------\
| Device ID | Create_Date |
+-----------+------------------+
| 2 | 2011-04-03 05:00 |
| 4 | 2012-03-05 03:45 |
| 5 | 2012-03-05 06:07 |
| ... | ... |
\-----------+------------------/
我正在使用PostgreSQL。
我想做以下事情:
对于每个设备,请计数:
我在以下方面取得了一些有限的成功(为了便于阅读,查询已经简化了一点 - 还有其他联接来获取订阅表,以及其他标准包括列表中的设备):
SELECT distinct device_operation.device_id as did, subscription.create_date,
(
SELECT COUNT(*)
FROM device_operation dop
WHERE dop.device_id = device_operation.device_id and
dop.create_date > '2014-07-08' and
dop.status = 'Success'
) as success,
(
SELECT COUNT(*)
FROM device_operation dop2
WHERE
dop2.device_id = subscription.device_id and
dop2.create_date > '2014-07-08' and
dop2.status = 'Failed' and
dop2.create_date <= subscription.create_date + interval '1 year'
) as failed_during_warranty,
(
SELECT COUNT(*)
FROM device_operation dop2
WHERE
dop2.device_id = subscription.device_id and
dop2.create_date > '2014-07-08' and
dop2.status = 'Failed' and
dop2.create_date > subscription.create_date + interval '1 year'
) as failed_after_warranty,
FROM device_operation, subscription
WHERE
device_operation.status = 'Success' and -- list operations which are successful
device_operation.create_date <= '2014-07-06' and -- list operations before that date
device_operation.device_id = subscription.device_id -- get warranty start for each operation
ORDER BY success DESC, failed_during_warranty DESC, failed_after_warranty DESC
你可以猜到,它太慢我无法运行查询。但是它可以让您了解结构。
我曾尝试使用NULLIF将请求合并为一个,希望它能使PostgreSQL只列出子查询一次而不是3,但它返回“子查询必须只返回一列”:
SELECT distinct device_operation.device_id as did, subscription.create_date,
(
SELECT COUNT(NULLIF(dop2.status != 'Success', true)) as completed,
COUNT(NULLIF(dop2.status != 'Failed' or not (dop2.create_date <= subscription.create_date + interval '1 year'), true)) as failed_in_warranty,
COUNT(NULLIF(dop2.status != 'Failed' or (dop2.create_date <= subscription.create_date + interval '1 year'), true)) as failed_after_warranty
FROM device_operation dop2
WHERE
dop2.device_id = device_operation.device_id and
dop2.device_id = subscription.device_id and
dop2.create_date > '2014-07-08'
) as subq
FROM device_operation, subscription
WHERE
device_operation.status = 'Success' and -- list operations which are successful
device_operation.create_date <= '2014-07-06' and -- list operations before that date
device_operation.device_id = subscription.device_id -- get warranty start for each operation
ORDER BY success DESC, failed_in_warranty DESC, failed_outside_warranty DESC
我还尝试将子查询移动到FROM子句,但是这不起作用,因为我需要为主查询的每一行运行子查询(或者我?可能有更好的方法)
我的期望是这样的:
/-----------+---------+------------------------+-----------------------\
| Device ID | Success | Failed during warranty | Failed after warranty |
+-----------+---------+------------------------+-----------------------+
| 194853 | 10 | 0 | 0 |
| 7853 | 5 | 5 | 0 |
| 5848 | 3 | 0 | 56 |
| 8546455 | 0 | 45 | 0 |
| 102 | 0 | 4 | 1 |
| 69329548 | 0 | 0 | 9 |
| 17 | 0 | 0 | 0 |
\-----------+---------+------------------------+-----------------------+
有人可以帮我找到最有效的方法吗?
编辑:极限案例:您可以认为所有设备都有订阅条目。
非常感谢!
答案 0 :(得分:0)
我认为你只需要条件聚合。我发现数据结构和逻辑有点难以理解,但我认为以下基本上是你需要的:
SELECT d.device_id,
SUM(CASE WHEN d.status = 'Failed' AND d.create_date <= '2014-07-06' + interval '2 day'
THEN 1 ELSE 0
END) as NumFails,
SUM(CASE WHEN d.status = 'Failed' AND d.create_date <= '2014-07-06' + interval '2 day' AND
d.create_date > s.create_date + interval '1 year'
THEN 1 ELSE 0
END) as NumFailsNoWarranty,
SUM(CASE WHEN d.status = 'Success' AND d.create_date <= '2014-07-06' + interval '2 day'
THEN 1 ELSE 0
END) as NumSuccesses
FROM device_operation d JOIN
subscription s
ON d.device_id = s.device_id
GROUP BY d.device_id
HAVING SUM(CASE WHEN d.status = 'Success' AND d.create_date <= '2014-07-06' THEN 1 ELSE 0 END) > 0;