我在postgresql中有以下两个表:
TABLE: act_codes
===================
activity act_desc
____________________
1 sleeping
2 commuting
3 eating
4 working
TABLE: data
===================
act1_1 act_1_2 act1_3 act1_4
---------------------------------------------
1 1 3 4
1 2 2 3
1 1 2 2
1 2 2 3
1 1 1 2
1 1 3 4
1 2 2 4
1 1 1 3
1 3 3 4
1 1 4 4
act_codes表基本上是一个活动表(带有代码和描述),数据表包含(在这种情况下)4个不同时间的活动代码(act1_1,act1_2,act1_3和act1_4)。
我正在尝试查询此内容以获取每个活动的计数表。我设法为每个单独的列(在本例中为act1_4)执行此操作,如下所示:
SELECT A.act_code, A.act_desc, COUNT (act1_4)
FROM act_codes AS A
LEFT JOIN data AS D
ON D.act1_4 = A.act_code
GROUP BY A.act_code, A.act_desc;
哪个适用于该列,但我有大量的列可供使用,所以如果有一种方法可以在SQL查询中执行此操作,则更喜欢它。
我现在有以下查询(非常感谢banazs):
SELECT
ac.act_code,
ac.act_desc,
act_time,
COUNT(activity) AS act_count
FROM
(SELECT
UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS act_time,
UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS activity
FROM
data d) t
RIGHT JOIN
act_codes ac ON t.activity = ac.act_code
GROUP BY
ac.act_code,
ac.act_desc,
act_time, activity
ORDER BY
activity,
act_time
;
哪个输出:
act_code act_desc act_time act_count
---------------------------------------------------------
1 sleeping act1_1 10
1 sleeping act1_2 6
1 sleeping act1_3 2
2 commuting act1_2 3
2 commuting act1_3 4
2 commuting act1_4 2
3 eating act1_2 1
3 eating act1_3 3
3 eating act1_4 3
4 working act1_3 1
4 working act1_4 5
这基本上就是我想要的。理想情况下,可以以某种方式添加具有零计数的行,但是我猜测这可能最好作为单独的过程完成(例如,在R中构建交叉表或其他东西)。
答案 0 :(得分:2)
您可以使用UNNEST
SELECT
UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS column_name,
UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS value
FROM
data d
;
计算活动:
SELECT
ac.act_code,
ac.act_desc,
COUNT(*)
FROM
(SELECT
UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS column_name,
UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS val
FROM
data d) t
INNER JOIN
act_codes ac ON t.val = ac.act_code
GROUP BY
ac.act_code,
ac.act_desc
;
答案 1 :(得分:0)
谢谢@banazs - 这对于帮助我理解如何构建这样的查询非常有用。
但是,我仍然难以安排查询来分割输出,以便每次都有一列计数。道歉 - 我认为这里的标签有点令人困惑(act1_1指的是在time_1完成的活动,' act1_2'指的是time_2等等)。我试图达到的结果如下:
act_code act_desc count_act1_1 count_act1_2 count_act1_3 count_act1_4
----------------------------------------------------------------------------------------
1 sleeping 10 6 2 0
2 commuting 0 3 4 2
3 eating 0 1 3 3
4 working 0 0 1 5
我不担心列中的输出 - 我可以很容易地重塑它,但重要的是表中存在零。这可能吗?
答案 2 :(得分:0)
要实现上述表格,需要对该查询进行一些重新设计。
首先,您必须创建一个辅助表,其中包含cartesian product列名称和活动:
SELECT
*
FROM
act_codes ac
-- if you have lots of columns you can query their
-- names from the information_schema.columns system
-- table
CROSS JOIN -- the CROSS JOIN combine each rows from both tables
(SELECT
column_name
FROM
information_schema.columns
WHERE
table_schema = 'stackoverflow'
AND table_name = 'data'
AND column_name LIKE 'act%') cn
;
添加活动数量:
SELECT
ac.act_code,
ac.act_desc,
cn.column_name,
-- the COALESCE add zero values where the original is NULL
COALESCE(ad.act_no ,0) AS act_no
FROM
act_codes ac
CROSS JOIN
(SELECT
column_name
FROM
information_schema.columns
WHERE
table_schema = 'stackoverflow'
AND table_name = 'data'
AND column_name LIKE 'act%') cn
-- you need to use LEFT JOIN to preserve all rows
-- from the cartesian product
LEFT JOIN
(SELECT
t.column_name,
t.act_code,
COUNT(*) AS act_no
FROM
(SELECT
UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS column_name,
UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS act_code
FROM
data d) t
GROUP BY
t.column_name,
t.act_code) ad ON ad.act_code = ac.act_code AND ad.column_name = cn.column_name
;
将结果格式化为看起来像你的可能,但有点凌乱。您需要创建两个表,第一个必须包含上一个查询的结果集,第二个包含列名。
CREATE TABLE acts AS
SELECT
ac.act_code,
ac.act_desc,
cn.column_name,
COALESCE(ad.act_no ,0) AS act_no
FROM
act_codes ac
CROSS JOIN
(SELECT
column_name
FROM
information_schema.columns
WHERE
table_schema = 'stackoverflow'
AND table_name = 'data'
AND column_name LIKE 'act%') cn
LEFT JOIN
(SELECT
t.column_name,
t.act_code,
COUNT(*) AS act_no
FROM
(SELECT
UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS column_name,
UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS act_code
FROM
data d) t
GROUP BY
t.column_name,
t.act_code) ad ON ad.act_code = ac.act_code AND ad.column_name = cn.column_name
;
CREATE TABLE column_names AS
SELECT
column_name
FROM
information_schema.columns
WHERE
table_schema = 'stackoverflow'
AND table_name = 'data'
AND column_name LIKE 'act%'
;
CREATE EXTENSION tablefunc;
它提供了交叉表()函数,使用它可以获得所描述的输出。
SELECT
*
FROM
crosstab(
'SELECT act_desc, column_name, act_no FROM acts ORDER BY 1',
'SELECT * FROM column_names'
)
AS
ct (
"act_desc" text,
"act1_1" int,
"act1_2" int,
"act1_3" int,
"act1_4" int
);
;
+-----------+--------+--------+--------+--------+
| act_desc | act1_1 | act1_2 | act1_3 | act1_4 |
+-----------+--------+--------+--------+--------+
| commuting | 0 | 3 | 4 | 2 |
| eating | 0 | 1 | 3 | 3 |
| sleeping | 10 | 6 | 2 | 0 |
| working | 0 | 0 | 1 | 5 |
+-----------+--------+--------+--------+--------+