带分组依据的子查询SQL

时间:2019-07-09 05:04:25

标签: sql

我的目标是通过分层/总计来获得percent,但是我意识到我所有的total值都相同,这是REF_YEAR分组之后的第一个值,因此每个分层值除以相同的值会导致错误的百分比。我的代码如下。我想知道子查询出了什么问题。我正在使用R内核在Jupyter Lab上运行它。

SELECT
    REF_YEAR,
    STRATA_DESC_E,
    COUNT(*) AS strata,
    (SELECT COUNT(*) FROM df GROUP BY REF_YEAR) AS total,
    COUNT(*) * 100.0 / (SELECT COUNT(*) FROM df GROUP BY REF_YEAR) AS percent
FROM 
    df
GROUP BY 
    REF_YEAR, STRATA_DESC_E

从这张图片中,您可以看到总值都相同

enter image description here

3 个答案:

答案 0 :(得分:0)

尝试

SELECT
REF_YEAR,
STRATA_DESC_E,
COUNT(*) AS strata,
(SELECT COUNT(*) FROM df GROUP BY REF_YEAR) AS total,
COUNT(*)*100.0 / (SELECT COUNT(x.*) FROM df x where x.REF_YEAR = d.REF_YEAR ) AS percent
FROM df d
GROUP BY REF_YEAR, STRATA_DESC_E

答案 1 :(得分:0)

我不知道您使用的是哪个数据库,如果它支持分析功能,那么这是编写逻辑的一种简单方法:

SELECT
    REF_YEAR,
    STRATA_DESC_E,
    COUNT(*) AS strata,
    COUNT(*) OVER (PARTITION BY REF_YEAR) AS total,
    100.0 * COUNT(*) / COUNT(*) OVER (PARTITION BY REF_YEAR) AS percent
FROM df
GROUP BY
    REF_YEAR,
    STRATA_DESC_E;

如果您使用的是类似MySQL 5.7的数据库或不支持解析功能的其他数据库,我们可以尝试加入一个子查询,以找到每个REF_YEAR集合:

SELECT
    t1.REF_YEAR,
    t1.STRATA_DESC_E,
    COUNT(*) AS strata,
    t2.cnt AS total,
    100.0 * COUNT(*) / t2.cnt AS percent
FROM df t1
INNER JOIN
(
    SELECT REF_YEAR, COUNT(*) AS cnt
    FROM df
    GROUP BY REF_YEAR
) t2
    ON t1.REF_YEAR = t2.REF_YEAR
GROUP BY
    t1.REF_YEAR,
    t1.STRATA_DESC_E;

答案 2 :(得分:0)

在MySQL 8+中,您可以使用窗口函数。正确的公式是:

SELECT REF_YEAR, STRATA_DESC_E,
       COUNT(*) AS strata,
       SUM(COUNT(*)) OVER (PARTITION BY REF_YEAR) AS total,
       100.0 * COUNT(*) / SUM(COUNT(*)) OVER (PARTITION BY REF_YEAR) AS percent
FROM df
GROUP BY REF_YEAR, STRATA_DESC_E;