如何仅计算前5个项目,然后将剩余的“其他”区域分组?

时间:2012-04-29 15:34:34

标签: mysql sql

我有一张这样的桌子;

+----+---------+-------------+
| id | user_id | screenWidth |
+----+---------+-------------+
|  1 |       1 |        1366 |
|  2 |       1 |        1366 |
|  3 |       1 |        1366 |
|  4 |       1 |        1366 |
|  5 |       2 |        1920 |
|  6 |       2 |        1920 |
|  7 |       3 |        1920 |
|  8 |       4 |        1280 |
|  9 |       5 |        1280 |
| 10 |       6 |        1280 |
| 11 |       7 |        1890 |
| ...|   ...   |     ...     |
| ...|   ...   |     ...     |
| ...|   ...   |     ...     |
| 100|       6 |        1910 |
+----+---------+-------------+

屏幕宽度很多,但其中90%等于5个值中的一个。

使用如下查询:

SELECT      screenwidth
        ,   COUNT(DISTINCT user_id) AS screenwidthcount
FROM        screenwidth
GROUP BY    screenwidth
ORDER BY    screenwidthcount;

(感谢How do I count only the first occurrence of a value?

我得到了一个很好的计算screenWidth发生的次数,每个用户只计算一次。

有没有办法计算最流行的screenWidths,然后收集一个名为“other”的类别中的所有其他内容 - 也就是说,而不是返回上面的返回行的查询,它返回6,前5个是它当前返回的前5个,第6个用其余值的总和调用另一个吗?

3 个答案:

答案 0 :(得分:3)

这是一种方法。以下脚本是根据此问题Rank function in MySQL

的答案创建的

查询为所有已计算机的非重复计数行分配排名。我在CASE表达式中指定了值 2 。这表示脚本将显示前2个屏幕宽度,其余将显示为其他。您需要根据您的要求更改值。我已对 99999 值进行了硬编码,以便对所有其他行进行分组。

可能有更好的方法可以做到这一点,但这是我可以使其发挥作用的方法之一。

Click here to view the demo in SQL Fiddle.

脚本

CREATE TABLE screenwidth 
(
    id INT NOT NULL
  , user_id INT NOT NULL
  , screenwidth INT NOT NULL
);

INSERT INTO screenwidth (id, user_id, screenwidth) VALUES
  (1, 1, 1366),
  (2, 2, 1366),
  (3, 2, 1366),
  (4, 2, 1366),
  (5, 3, 1366),
  (6, 1, 1920),
  (7, 2, 1920),
  (8, 1, 1440),
  (9, 2, 1440),
  (10, 3, 1440),
  (11, 4, 1440),
  (12, 1, 1280),
  (13, 1, 1024),
  (14, 2, 1024),
  (15, 3, 1024),
  (16, 3, 1024),
  (17, 3, 1024),
  (18, 1, 1366);

SELECT screenwidth
    , SUM(screenwidthcount) AS screenwidth_count
FROM
(
    SELECT      CASE    
                    WHEN @curRank < 2 THEN screenwidth 
                    ELSE 'Other' 
                END AS screenwidth
            ,   screenwidthcount
            ,   @curRank := 
                (   CASE 
                        WHEN @curRank < 2 THEN @curRank + 1 
                        ELSE 99999
                    END
                ) AS rank
    FROM
    (
        SELECT      screenwidth
                ,   COUNT(DISTINCT user_id) AS screenwidthcount
        FROM        screenwidth
        GROUP BY    screenwidth
        ORDER BY    screenwidthcount DESC
    ) T1
                ,   (SELECT @curRank := 0) r
) T2
GROUP BY    screenwidth
ORDER BY    rank;

输出

SCREENWIDTH SCREENWIDTH_COUNT
----------- -----------------
1440               4
1024               3
Other              6

答案 1 :(得分:1)

试试这个:

select

  case when rank <= 5 then rank else 'Other' end as screenwidth, 

  sum(screenwidthcount) as screenwidthcount,

  least(rank,6) as LimitRank

from
(
  SELECT
  *, (@r := @r + 1) as rank
  FROM
  (
    SELECT      screenwidth
            ,   COUNT(DISTINCT user_id) AS screenwidthcount

    FROM        tbl

    GROUP BY    screenwidth
    ORDER BY    screenwidthcount desc, screenwidth desc
  ) AS X
  cross join (select @r := 0 as init ) rx
) as y

group by LimitRank

数据样本:

CREATE TABLE tbl
    (id int, user_id int, screenWidth int);

INSERT INTO tbl
    (id, user_id, screenWidth)
VALUES
    (1, 1, 1366),
    (2, 1, 1366),
    (3, 1, 1366),
    (4, 1, 1366),
    (5, 2, 1920),
    (6, 2, 1920),
    (7, 3, 1920),
    (8, 4, 1280),
    (9, 5, 1280),
    (10, 6, 1280),
    (11, 7, 1890),
    (12, 9, 1890),
    (13, 9, 1890),
    (13, 9, 1024),
    (13, 9, 800),
    (100, 6, 1910);

输出:

SCREENWIDTH SCREENWIDTHCOUNT    LIMITRANK
1280        3                   1
1920        2                   2
1890        2                   3
1910        1                   4
1366        1                   5
Other       2                   6

实时测试:http://www.sqlfiddle.com/#!2/c0e94/33


以下是无上限的结果:http://www.sqlfiddle.com/#!2/c0e94/31

SCREENWIDTH SCREENWIDTHCOUNT
1280        3
1920        2
1890        2
1910        1
1366        1
1024        1
800         1

答案 2 :(得分:0)

是的,有了不公正的案例陈述:我没有MySQL,但是这个或类似的东西应该有用......

一个。 Inner Select生成screnwidth的结果集,以及具有该screenwidth的不同用户的数量...(这有效地为每个用户计算每个screnwidth一次)。结果集仅限于五个或更多用户使用的屏幕宽度。

B中。然后外部查询将整个表连接到该结果集,对表达式进行分组并对“Cnt”求和,该“Cnt”表示使用每个屏幕宽度的用户数。

   Select case When Z.Cnt < 5 Then screnwidth, else 0 end
       Sum(Z.Cnt) screenwidthcount, 
   From screenwidth A
      Left Join (Select screenwidth, Count(Distinct User_ID) Cnt
                 From screenwidth
                 Group By screenwidth
                 Having count(*) > 4) Z
        On Z.screeenwidth = A.screeenwidth         
   Group By case When Z.Cnt < 5 Then screnwidth, else 0 end

℃。如果MySql有一个类似SQL Server Str()函数的函数,你可以使用它将case表达式转换为字符串,然后在else之后输入0,你可以使用'other'

   Select case When Z.Cnt < 5 Then Str(screnwidth, 6,0) else 'other' end
       Sum(Z.Cnt) screenwidthcount, 
   From screenwidth A
      Left Join (Select screenwidth, Count(Distinct User_ID) Cnt
                 From screenwidth
                 Group By screenwidth
                 Having count(*) > 4) Z
        On Z.screeenwidth = A.screeenwidth         
   Group By case When Z.Cnt < 5 Then Str(screnwidth, 6,0) else 'other'  end