在postgresql中加入多列

时间:2017-02-20 08:19:23

标签: sql database postgresql

我在postgresql中有以下两个表:

     TABLE: act_codes
    ===================
     activity  act_desc
    ____________________
        1      sleeping
        2      commuting
        3      eating
        4      working
     TABLE: data
    ===================
    act1_1     act_1_2     act1_3     act1_4
    ---------------------------------------------
      1         1           3           4
      1         2           2           3
      1         1           2           2
      1         2           2           3
      1         1           1           2
      1         1           3           4
      1         2           2           4
      1         1           1           3
      1         3           3           4
      1         1           4           4

act_codes表基本上是一个活动表(带有代码和描述),数据表包含(在这种情况下)4个不同时间的活动代码(act1_1,act1_2,act1_3和act1_4)。

我正在尝试查询此内容以获取每个活动的计数表。我设法为每个单独的列(在本例中为act1_4)执行此操作,如下所示:

    SELECT A.act_code, A.act_desc, COUNT (act1_4) 
    FROM act_codes AS A
    LEFT JOIN data AS D 
    ON D.act1_4 = A.act_code
    GROUP BY A.act_code, A.act_desc;   

哪个适用于该列,但我有大量的列可供使用,所以如果有一种方法可以在SQL查询中执行此操作,则更喜欢它。

我现在有以下查询(非常感谢banazs):

    SELECT
        ac.act_code, 
        ac.act_desc,
        act_time,
        COUNT(activity) AS act_count
    FROM
        (SELECT
            UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS act_time,
            UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS activity
        FROM
            data d) t
    RIGHT JOIN
        act_codes ac ON t.activity = ac.act_code
    GROUP BY
        ac.act_code, 
        ac.act_desc,
        act_time, activity
    ORDER BY 
        activity, 
        act_time
    ;

哪个输出:

    act_code        act_desc        act_time        act_count
    ---------------------------------------------------------
        1           sleeping            act1_1          10
        1           sleeping            act1_2          6
        1           sleeping            act1_3          2
        2           commuting           act1_2          3
        2           commuting           act1_3          4
        2           commuting           act1_4          2
        3           eating              act1_2          1
        3           eating              act1_3          3
        3           eating              act1_4          3
        4           working             act1_3          1
        4           working             act1_4          5

这基本上就是我想要的。理想情况下,可以以某种方式添加具有零计数的行,但是我猜测这可能最好作为单独的过程完成(例如,在R中构建交叉表或其他东西)。

3 个答案:

答案 0 :(得分:2)

您可以使用UNNEST

“取消隐藏”数据
   SELECT
        UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS column_name,
        UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS value
    FROM
        data d
    ;

计算活动:

SELECT
    ac.act_code, 
    ac.act_desc,
    COUNT(*)
FROM
    (SELECT
        UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS column_name,
        UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS val
    FROM
        data d) t
INNER JOIN
    act_codes ac ON t.val = ac.act_code
GROUP BY
    ac.act_code, 
    ac.act_desc 
;

答案 1 :(得分:0)

谢谢@banazs - 这对于帮助我理解如何构建这样的查询非常有用。

但是,我仍然难以安排查询来分割输出,以便每次都有一列计数。道歉 - 我认为这里的标签有点令人困惑(act1_1指的是在time_1完成的活动,' act1_2'指的是time_2等等)。我试图达到的结果如下:

    act_code    act_desc        count_act1_1    count_act1_2    count_act1_3    count_act1_4
    ----------------------------------------------------------------------------------------
        1       sleeping            10              6               2               0
        2       commuting           0               3               4               2
        3       eating              0               1               3               3
        4       working             0               0               1               5

我不担心列中的输出 - 我可以很容易地重塑它,但重要的是表中存在零。这可能吗?

答案 2 :(得分:0)

要实现上述表格,需要对该查询进行一些重新设计。

首先,您必须创建一个辅助表,其中包含cartesian product列名称和活动:

SELECT 
    *
FROM
    act_codes ac
-- if you have lots of columns you can query their 
-- names from the information_schema.columns system 
-- table 
CROSS JOIN -- the CROSS JOIN combine each rows from both tables
    (SELECT 
        column_name 
    FROM 
        information_schema.columns 
    WHERE 
        table_schema = 'stackoverflow' 
        AND table_name = 'data' 
        AND column_name LIKE 'act%') cn 
;

添加活动数量:

SELECT 
    ac.act_code,
    ac.act_desc,
    cn.column_name,
    -- the COALESCE add zero values where the original is NULL
    COALESCE(ad.act_no ,0) AS act_no
FROM
    act_codes ac
CROSS JOIN
    (SELECT 
        column_name
    FROM 
        information_schema.columns 
    WHERE 
        table_schema = 'stackoverflow' 
        AND table_name = 'data' 
        AND column_name LIKE 'act%') cn
-- you need to use LEFT JOIN to preserve all rows
-- from the cartesian product
LEFT JOIN
    (SELECT 
        t.column_name,
        t.act_code,
        COUNT(*) AS act_no
    FROM
        (SELECT
            UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS column_name,
            UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS act_code
        FROM
            data d) t
    GROUP BY
        t.column_name,
        t.act_code) ad ON ad.act_code = ac.act_code AND ad.column_name = cn.column_name 
;

将结果格式化为看起来像你的可能,但有点凌乱。您需要创建两个表,第一个必须包含上一个查询的结果集,第二个包含列名。

CREATE TABLE acts AS
    SELECT 
        ac.act_code,
        ac.act_desc,
        cn.column_name,
        COALESCE(ad.act_no ,0) AS act_no
    FROM
        act_codes ac
    CROSS JOIN
        (SELECT 
            column_name
        FROM 
            information_schema.columns 
        WHERE 
            table_schema = 'stackoverflow' 
            AND table_name = 'data' 
            AND column_name LIKE 'act%') cn
    LEFT JOIN
        (SELECT 
            t.column_name,
            t.act_code,
            COUNT(*) AS act_no
        FROM
            (SELECT
                UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS column_name,
                UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS act_code
            FROM
                data d) t
        GROUP BY
            t.column_name,
            t.act_code) ad ON ad.act_code = ac.act_code AND ad.column_name = cn.column_name 
;

CREATE TABLE column_names AS
    SELECT 
        column_name
    FROM 
        information_schema.columns 
    WHERE 
        table_schema = 'stackoverflow' 
        AND table_name = 'data' 
        AND column_name LIKE 'act%'
;

安装tablefunc extension

CREATE EXTENSION tablefunc;

它提供了交叉表()函数,使用它可以获得所描述的输出。

SELECT 
    *
FROM   
    crosstab(
        'SELECT act_desc, column_name, act_no FROM acts ORDER  BY 1',  
        'SELECT * FROM column_names'
    )  
AS 
    ct (
        "act_desc" text, 
        "act1_1" int, 
        "act1_2" int, 
        "act1_3" int, 
        "act1_4" int
        );
;

+-----------+--------+--------+--------+--------+
| act_desc  | act1_1 | act1_2 | act1_3 | act1_4 |
+-----------+--------+--------+--------+--------+
| commuting |      0 |      3 |      4 |      2 |
| eating    |      0 |      1 |      3 |      3 |
| sleeping  |     10 |      6 |      2 |      0 |
| working   |      0 |      0 |      1 |      5 |
+-----------+--------+--------+--------+--------+