我的目标是通过分层/总计来获得percent
,但是我意识到我所有的total
值都相同,这是REF_YEAR分组之后的第一个值,因此每个分层值除以相同的值会导致错误的百分比。我的代码如下。我想知道子查询出了什么问题。我正在使用R内核在Jupyter Lab上运行它。
SELECT
REF_YEAR,
STRATA_DESC_E,
COUNT(*) AS strata,
(SELECT COUNT(*) FROM df GROUP BY REF_YEAR) AS total,
COUNT(*) * 100.0 / (SELECT COUNT(*) FROM df GROUP BY REF_YEAR) AS percent
FROM
df
GROUP BY
REF_YEAR, STRATA_DESC_E
从这张图片中,您可以看到总值都相同
答案 0 :(得分:0)
尝试
SELECT
REF_YEAR,
STRATA_DESC_E,
COUNT(*) AS strata,
(SELECT COUNT(*) FROM df GROUP BY REF_YEAR) AS total,
COUNT(*)*100.0 / (SELECT COUNT(x.*) FROM df x where x.REF_YEAR = d.REF_YEAR ) AS percent
FROM df d
GROUP BY REF_YEAR, STRATA_DESC_E
答案 1 :(得分:0)
我不知道您使用的是哪个数据库,如果它支持分析功能,那么这是编写逻辑的一种简单方法:
SELECT
REF_YEAR,
STRATA_DESC_E,
COUNT(*) AS strata,
COUNT(*) OVER (PARTITION BY REF_YEAR) AS total,
100.0 * COUNT(*) / COUNT(*) OVER (PARTITION BY REF_YEAR) AS percent
FROM df
GROUP BY
REF_YEAR,
STRATA_DESC_E;
如果您使用的是类似MySQL 5.7的数据库或不支持解析功能的其他数据库,我们可以尝试加入一个子查询,以找到每个REF_YEAR
集合:
SELECT
t1.REF_YEAR,
t1.STRATA_DESC_E,
COUNT(*) AS strata,
t2.cnt AS total,
100.0 * COUNT(*) / t2.cnt AS percent
FROM df t1
INNER JOIN
(
SELECT REF_YEAR, COUNT(*) AS cnt
FROM df
GROUP BY REF_YEAR
) t2
ON t1.REF_YEAR = t2.REF_YEAR
GROUP BY
t1.REF_YEAR,
t1.STRATA_DESC_E;
答案 2 :(得分:0)
在MySQL 8+中,您可以使用窗口函数。正确的公式是:
SELECT REF_YEAR, STRATA_DESC_E,
COUNT(*) AS strata,
SUM(COUNT(*)) OVER (PARTITION BY REF_YEAR) AS total,
100.0 * COUNT(*) / SUM(COUNT(*)) OVER (PARTITION BY REF_YEAR) AS percent
FROM df
GROUP BY REF_YEAR, STRATA_DESC_E;