我有一个PostgreSQL数据库,我试图总结一下收银机的收入。收银机可以具有状态ACTIVE或INACTIVE,但我只想总结在给定时间段内处于ACTIVE状态时创建的收益。
我有两张桌子;一个标志着收入,另一个标志着收银机状态:
CREATE TABLE counters
(
id bigserial NOT NULL,
"timestamp" timestamp with time zone,
total_revenue bigint,
id_of_machine character varying(50),
CONSTRAINT counters_pkey PRIMARY KEY (id)
)
CREATE TABLE machine_lifecycle_events
(
id bigserial NOT NULL,
event_type character varying(50),
"timestamp" timestamp with time zone,
id_of_affected_machine character varying(50),
CONSTRAINT machine_lifecycle_events_pkey PRIMARY KEY (id)
)
每1分钟添加一个计数器条目,而total_revenue只会增加。每次机器状态发生变化时,都会添加machine_lifecycle_events条目。
我添加了一张说明问题的图片。这是蓝色时期的收入,应该加以总结。
我创建了一个查询,可以在给定的瞬间为我提供总收入:
SELECT total_revenue
FROM counters
WHERE timestamp < '2014-03-05 11:00:00'
AND id_of_machine='1'
ORDER BY
timestamp desc
LIMIT 1
有关如何解决此问题的任何想法?
示例数据:
INSERT INTO counters VALUES
(1, '2014-03-01 00:00:00', 100, '1')
, (2, '2014-03-01 12:00:00', 200, '1')
, (3, '2014-03-02 00:00:00', 300, '1')
, (4, '2014-03-02 12:00:00', 400, '1')
, (5, '2014-03-03 00:00:00', 500, '1')
, (6, '2014-03-03 12:00:00', 600, '1')
, (7, '2014-03-04 00:00:00', 700, '1')
, (8, '2014-03-04 12:00:00', 800, '1')
, (9, '2014-03-05 00:00:00', 900, '1')
, (10, '2014-03-05 12:00:00', 1000, '1')
, (11, '2014-03-06 00:00:00', 1100, '1')
, (12, '2014-03-06 12:00:00', 1200, '1')
, (13, '2014-03-07 00:00:00', 1300, '1')
, (14, '2014-03-07 12:00:00', 1400, '1');
INSERT INTO machine_lifecycle_events VALUES
(1, 'ACTIVE', '2014-03-01 08:00:00', '1')
, (2, 'INACTIVE', '2014-03-03 00:00:00', '1')
, (3, 'ACTIVE', '2014-03-05 00:00:00', '1')
, (4, 'INACTIVE', '2014-03-06 12:00:00', '1');
示例查询:
“2014-03-02 08:00:00”和“2014-03-06 08:00:00”之间的收入在第一个ACTIVE期间为300. 100,在第二个ACTIVE期间为200。
答案 0 :(得分:2)
为了让我的工作更轻松,我在处理问题之前清理了数据库设计:
CREATE TEMP TABLE counter (
id bigserial PRIMARY KEY
, ts timestamp NOT NULL
, total_revenue bigint NOT NULL
, machine_id int NOT NULL
);
CREATE TEMP TABLE machine_event (
id bigserial PRIMARY KEY
, ts timestamp NOT NULL
, machine_id int NOT NULL
, status_active bool NOT NULL
);
ts
代替“timestamp”。切勿将基本类型名称用作列名。machine_id
并将其设为integer
,而不是varchar(50)
。event_type varchar(50)
也应该是integer
外键,或enum
。或者甚至仅boolean
仅用于活动/非活动。简化为status_active bool
。INSERT
陈述。total_revenue only increases
(每个问题)。 machine_event
中每台计算机的每个“下一行”行都相反status_active
。1。如何计算两个时间戳之间的收入?
WITH span AS (
SELECT '2014-03-02 12:00'::timestamp AS s_from -- start of time range
, '2014-03-05 11:00'::timestamp AS s_to -- end of time range
)
SELECT machine_id, s.s_from, s.s_to
, max(total_revenue) - min(total_revenue) AS earned
FROM counter c
, span s
WHERE ts BETWEEN s_from AND s_to -- borders included!
AND machine_id = 1
GROUP BY 1,2,3;
2. 当我必须将
machine_event
中的时间戳与输入期间进行比较时,如何确定蓝色时段的开始和结束时间戳?
在给定时间范围内{em>所有计算机的此查询(span
)
在CTE WHERE machine_id = 1
中添加cte
以选择特定计算机。
WITH span AS (
SELECT '2014-03-02 08:00'::timestamp AS s_from -- start of time range
, '2014-03-06 08:00'::timestamp AS s_to -- end of time range
)
, cte AS (
SELECT machine_id, ts, status_active, s_from
, lead(ts, 1, s_to) OVER w AS period_end
, first_value(ts) OVER w AS first_ts
FROM span s
JOIN machine_event e ON e.ts BETWEEN s.s_from AND s.s_to
WINDOW w AS (PARTITION BY machine_id ORDER BY ts)
)
SELECT machine_id, ts AS period_start, period_end -- start in time frame
FROM cte
WHERE status_active
UNION ALL -- active start before time frame
SELECT machine_id, s_from, ts
FROM cte
WHERE NOT status_active
AND ts = first_ts
AND ts <> s_from
UNION ALL -- active start before time frame, no end in time frame
SELECT machine_id, s_from, s_to
FROM (
SELECT DISTINCT ON (1)
e.machine_id, e.status_active, s.s_from, s.s_to
FROM span s
JOIN machine_event e ON e.ts < s.s_from -- only from before time range
LEFT JOIN cte c USING (machine_id)
WHERE c.machine_id IS NULL -- not in selected time range
ORDER BY e.machine_id, e.ts DESC -- only the latest entry
) sub
WHERE status_active -- only if active
ORDER BY 1, 2;
结果是图像中的蓝色时段列表 SQL Fiddle demonstrating both.
最近的类似问题:
Sum of time difference between rows
答案 1 :(得分:0)
machine_lifecycle_events
的id可以用来确定访问者和前身。因此,为了使我的解决方案更好地工作,您应该在活动和非活动事件之间建立链接。可能还有其他方法可以解决它,但这会增加更多的复杂性。
首先,要获得每台计算机所有活动期间的收入,您可以执行以下操作:
select c.id_of_machine, cycle_id, cycle_start, cycle_end, sum(total_revenue)
from counters c join (
select e1.id as cycle_id,
e1.timestamp as cycle_start,
e2.timestamp as cycle_end,
e1.id_of_affected_machine as cycle_machine_id
from machine_lifecycle_events e1 join machine_lifecycle_events e2
on e1.id + 1 = e2.id and -- this should be replaced with a specific column to find cycles which belong together
e1.id_of_affected_machine = e2.id_of_affected_machine
where e1.event_type = 'ACTIVE'
) cycle
on c.id_of_machine = cycle_machine_id and
cycle_start <= c.timestamp and c.timestamp <= cycle_end
group by c.id_of_machine, cycle_id, cycle_start, cycle_end
order by c.id_of_machine, cycle_id
您可以进一步使用此查询,并在条件下添加更多条件,以便仅在一个时间范围内或特定机器获得收入:
select sum(total_revenue)
from counters c join (
select e1.id as cycle_id,
e1.timestamp as cycle_start,
e2.timestamp as cycle_end,
e1.id_of_affected_machine as cycle_machine_id
from machine_lifecycle_events e1 join machine_lifecycle_events e2
on e1.id + 1 = e2.id and -- this should be replaced with a specific column to find cycles which belong together
e1.id_of_affected_machine = e2.id_of_affected_machine
where e1.event_type = 'ACTIVE'
) cycle
on c.id_of_machine = cycle_machine_id and
cycle_start <= c.timestamp and c.timestamp <= cycle_end
where '2014-03-02 08:00:00' <= c.timestamp and c.timestamp <= '2014-03-06 08:00:00'
and c.id_of_machine = '1'
正如开头和评论中所提到的,我找到连接事件的方式不适用于具有多台机器的任何更复杂的示例。最简单的方法是使另一列始终指向前一个事件。另一种方法是拥有一个可以找到这些事件的函数,但这个解决方案无法使用索引。
答案 2 :(得分:0)
使用自连接和构建间隔表以及每个间隔的实际状态。
with intervals as (
select e1.timestamp time1, e2.timestamp time2, e1.EVENT_TYPE as status
from machine_lifecycle_events e1
left join machine_lifecycle_events e2 on e2.id = e1.id + 1
) select * from counters c
join intervals i on (timestamp between i.time1 and i.time2 or i.time2 is null)
and i.status = 'ACTIVE';
我没有使用聚合来显示结果集,我认为你可以做到这一点。我也错过了machineId来简化这种模式的演示。