如何在查询具有GROUP BY时获得总数的百分比?

时间:2013-01-25 05:40:42

标签: mysql sql

假设我有一个非规范化的表格,其中包含电影演员名称和他们所在的电影。例如

CREATE TABLE movies_actors (
  movies_actors_id INT,
  movie VARCHAR(255),
  actor VARCHAR(255),
  PRIMARY KEY (movies_actors_id)
);

我做了一个SELECT actor, COUNT(1) FROM movies_actors GROUP BY actor来了解这位演员有多少部电影。但我也想知道该演员的电影比例是多少。

我想我可以这样做:

SELECT
  actor,
  COUNT(1) AS total,
  COUNT(1) / (SELECT COUNT(1) FROM movies_actors) * 100 AS avg
FROM movies_actors
GROUP BY actor;

但这只是......似乎......很讨厌。

有什么想法吗?

5 个答案:

答案 0 :(得分:22)

对于大型集合,JOIN可能比子查询执行得更好。

SELECT ma.actor
     , COUNT(1) AS total
     , COUNT(1) / t.cnt * 100 AS `percentage`
  FROM movies_actors ma
 CROSS
  JOIN (SELECT COUNT(1) AS cnt FROM movies_actors) t
 GROUP
    BY ma.actor

对于大型集,并且当返回大部分行时,JOIN操作通常可以胜过子查询。在你的情况下,它不是一个相关的子查询,所以MySQL不应该多次执行,所以它可能没有任何区别。

答案 1 :(得分:1)

不使用联接和多个查询:-

select actor,counter,  100 * counter / @total as percentage
from(
select actor, 
        case when actor is null
            then @total := count(*)
            else count(*)
        end as counter
    from movies_actors 
    group by actor
    with rollup
) mytable

答案 2 :(得分:0)

我不确定它是否“更好”,但你可以做一个SUM并在其他地方做数学运算:

SELECT actor,
    COUNT(1) AS total,
    SUM(oneMoviePercentPts) AS percentage
FROM movies_actors
CROSS JOIN 
(
    SELECT 100 / CAST(COUNT(1) AS DECIMAL(15,4)) AS oneMoviePercentPts 
    FROM movies_actors
) t
GROUP BY actor

我希望MySQL优化器足够智能,不会多次执行你的子查询,但是连接语法会使它明确。

答案 3 :(得分:0)

当您想要从同一个表中获取操纵数据时,请执行自交叉连接。

SELECT
m.actor,
COUNT(m.actor) AS total,
(COUNT(m.actor) / t.total_movies) * 100 AS avg
FROM movies_actors m
cross (select count(*) as total_movies from movies_actors) t
GROUP BY m.actor;

答案 4 :(得分:0)

这对我有用:

SELECT tmpTotal.yearmonth, tmpTotal.rec_count, 
      (tmpTotal.rec_count / @myCumul) * 100 AS myPercentage
FROM
(
  SELECT tmpResult.*, @myCumul := @myCumul + tmpResult.rec_count AS myNewCumul
  FROM
  (
    SELECT date_format(d.created_at, '%Y/%m') as yearmonth, count(*) rec_count
    FROM cf4a_webapp.factTable d 
      join cf4a_webapp.dimTable c on (d.client_id = c.id)
    WHERE c.id = 25 
      AND d.created_at >= '2019-01-01 00:00:01' 
      AND d.created_at < '2020-01-01 00:00:01'
    GROUP BY yearmonth
  ) tmpResult
  JOIN (SELECT @myCumul := 0) tmpCumul
) tmpTotal;