我有一张与以下表格相似的表格:
CREATE TABLE movements (
"id" integer,
"date" timestamp with time zone,
"origin" character varying(255),
"destination" character varying(255),
"vehicle" character varying(255)
);
INSERT INTO movements (id,date,origin,destination,vehicle)
VALUES (1, '2017-11-01 00:00:00+00', 'loc_A', 'loc_B', 'V1'),
(2, '2017-11-01 00:00:00+00', 'loc_C', 'loc_B', 'V1'),
(3, '2017-11-01 00:00:00+00', 'loc_D', 'loc_B', 'V1'),
(4, '2017-11-02 00:00:00+00', 'loc_E', 'loc_B', 'V1'),
(5, '2017-11-02 00:00:00+00', 'loc_A', 'loc_B', 'V2'),
(6, '2017-11-02 00:00:00+00', 'loc_F', 'loc_B', 'V2');
如何计算每个起点使用相同车辆的不同起点位置的数量以及同一天每个起点使用相同车辆的起点位置的平均数和最大数量?
在这种情况下将是类似
的输出location, total, daily_mean, daily_max
loc_A , 4, 1.5, 2
loc_C , 3, 2, 2
loc_D , 3, 2, 2
loc_E , 3, 0, 0
loc_F , 1, 1, 1
答案 0 :(得分:1)
根据您的描述,我认为以下内容应该有效。它使用自连接以便在公共表表达式中按天计算统计信息,然后在几天内进行汇总以获取所需的列。为了获得总体列表,我们将各天的位置列表取消嵌套,然后将它们再次组合成一个数组,与在基表上使用子查询相比,这可能不理想,但希望可以满足以下条件:
with day_values as (
select m.origin, m.date
, count(distinct m2.origin) as locations_with_shared_vehicle
, array_agg(distinct m2.origin) as location_list
from movements m
join movements m2
on m2.vehicle = m.vehicle
and m2.date = m.date
and m2.origin <> m.origin
group by m.origin, m.date )
select t.origin as location
, array_length( (select array( SELECT DISTINCT unnest(t2.location_list) from day_values t2 WHERE t2.origin = t.origin) ), 1) AS total_locations
, avg(locations_with_shared_vehicle) as daily_mean
, max(locations_with_shared_vehicle) as daily_max
from day_values t
group by t.origin
order by t.origin;