Question

我有一个名为locations的Postgres表。它有几百万行，数据采用以下格式

 id |  location_a  |  location_b
----+--------------+--------------
 36 | Sydney       | London
 37 | Atlanta      | London
 38 | New York     | Tokyo
 39 | Tokyo        | Sydney
 40 | Tokyo        | Sydney
.....

我希望能够生成以下表单的数据透视表/计数 -

问题是列数是可变的，因此必须以编程方式/动态确定，而不是使用静态SELECT查询。

我理解PL / pgSQL的基本概念，因为它是一种脚本语言，可以让我做这样的动态事情。

但是我开始时遇到了很多麻烦。有没有一种简单的方法来计算上述内容？

Answer 1

您可以动态创建视图。描述了比你更简单的案例的想法和解决方案in this answer.请在继续之前阅读。

我们将使用以下查询来创建视图：

with all_locations(location) as (
    select distinct location_a
    from locations
    union
    select distinct location_b
    from locations
)

select location_a as location, json_object_agg(location_b, count order by location_b) as data
from (
    select a.location as location_a, b.location as location_b, count(l.*)
    from all_locations a
    cross join all_locations b
    left join locations l on location_a = a.location and location_b = b.location
    group by 1, 2
    ) s
group by 1
order by 1;

结果：

 location |                                    data                                    
----------+----------------------------------------------------------------------------
 Atlanta  | { "Atlanta" : 0, "London" : 1, "New York" : 0, "Sydney" : 0, "Tokyo" : 0 }
 London   | { "Atlanta" : 0, "London" : 0, "New York" : 0, "Sydney" : 0, "Tokyo" : 0 }
 New York | { "Atlanta" : 0, "London" : 0, "New York" : 0, "Sydney" : 0, "Tokyo" : 1 }
 Sydney   | { "Atlanta" : 0, "London" : 1, "New York" : 0, "Sydney" : 0, "Tokyo" : 0 }
 Tokyo    | { "Atlanta" : 0, "London" : 0, "New York" : 0, "Sydney" : 2, "Tokyo" : 0 }
(5 rows)

城市列表将在函数内使用两次，因此它存储在数组cities中。请注意，您可以将函数中的第一个查询替换为更简单的查询（它只是不同城市的有序列表）。

create or replace function create_locations_view()
returns void language plpgsql as $$
declare
    cities text[];
    list text;
begin
--  fill array with all cities in alphabetical order
    select array_agg(location_a order by location_a)
    from (
        select distinct location_a
        from locations
        union
        select distinct location_b
        from locations
        ) s
    into cities;

--  construct list of columns to use in select list
    select string_agg(format($s$data->>'%1$s' "%1$s"$s$, city), ', ')
    from unnest(cities) city
    into list;

--  create view from select based on the above list
    execute format($ex$
        drop view if exists locations_view;
        create view locations_view as 
        select location, %1$s
        from (
            select location_a as location, json_object_agg(location_b, count order by location_b) as data
            from (
                select a.location as location_a, b.location as location_b, count(l.*)
                from unnest(%2$L::text[]) a(location)
                cross join unnest(%2$L::text[]) b(location)
                left join locations l on location_a = a.location and location_b = b.location
                group by 1, 2
                ) s
            group by 1
        ) s
        order by 1
        $ex$, list, cities);
end $$;

使用该功能并从创建的视图中选择数据：

select create_locations_view();
select * from locations_view;

 location | Atlanta | London | New York | Sydney | Tokyo 
----------+---------+--------+----------+--------+-------
 Atlanta  | 0       | 1      | 0        | 0      | 0
 London   | 0       | 0      | 0        | 0      | 0
 New York | 0       | 0      | 0        | 0      | 1
 Sydney   | 0       | 1      | 0        | 0      | 0
 Tokyo    | 0       | 0      | 0        | 2      | 0
(5 rows)

我多次使用这种方法，但我没有真正大数据的经验，所以我不能保证它是有效的。

如何使用PL / pgSQL构建具有动态列的表

1 个答案: