如何在GROUP BY查询中为每个组返回特定​​列的最常用值?

时间:2015-10-02 13:54:29

标签: mysql sql group-by

我有这个示例表:

  sort_order  product      color    productid   price
  ----------  -------      ------   ---------   -----
      1       bicycle       red      2573257     50
      2       bicycle       red      0983989     40
      3       bicycle       red      2093802     45
      4       bicycle       blue     9283409     55
      5       bicycle       blue     3982734     60
      1       teddy bear    brown    9847598     20
      2       teddy bear    black    3975897     25
      3       teddy bear    white    2983428     30
      4       teddy bear    brown    3984939     35
      5       teddy bear    brown    0923842     30
      1       tricycle      pink     2356235     25
      2       tricycle      blue     2394823     30
      3       tricycle      blue     9338832     35
      4       tricycle      pink     2383939     30
      5       tricycle      blue     3982982     35

我想要一个返回产品的查询,平均价格和最常用的颜色。

因此,我希望此示例中的查询返回:

product      most_frequent_color     average_price
-------      -------------------     -------------
bicycle      red                     50
teddy bear   brown                   28
tricycle     blue                    31

平均部分似乎很容易按产品分组并使用avg(价格),但我该如何解决最常见的颜色部分?

这是我到目前为止可以弄清楚的查询,但我不知道如何为每个组获取most_frequent_color:

SELECT product, avg(price) AS average_price from products
WHERE sort_order <= 5
GROUP BY product

在我的真实世界表中,每组通常会有更多行,而不是我感兴趣的所以我只使用sort_order字段获得有限数量的行

对于在&#34; color&#34;的所有行中都为null的稀有组。或者有多个最常用的颜色我想在返回的most_frequent_color列中返回null

感谢您对此有任何帮助!

3 个答案:

答案 0 :(得分:2)

您可以在SELECT子句中使用其他查询来有效地对相同数据执行聚合查询:

SELECT   t.product,
         Avg ( t.price ) AS average_price,
         (
                  SELECT   IF ( Count(*) = t4.count, NULL, t2.color ) 'color'
                  FROM     products t2
                  JOIN
                           (
                                    SELECT   t3.product,
                                             t3.color,
                                             count(*) 'count'
                                    FROM     products t3
                                    GROUP BY t3.product ,
                                             t3.color
                                    ORDER BY count(*) DESC
                           ) t4
                  ON       t2.product = t4.product
                           AND t2.color <> t4.color
                  WHERE    t2.product = t.product
                  GROUP BY t2.color
                  ORDER BY count(*) DESC limit 1
         ) AS most_frequent_color
FROM     products t
WHERE    t.sort_order <= 5
GROUP BY t.product

因此,我们使用products列链接product的第二个副本,在列表顶部选择最常用的每种颜色的计数(对于该产品),然后选择第一行仅 - 因此该产品的最常见颜色值。

这与内联视图(放在查询的FROM子句中)不同。

注意: 这适用于MySQL,但它不是数据库不可知的。

<强>更新 现在检查具有相同频率的多于1种颜色并返回null。

答案 1 :(得分:2)

SELECT m.product
     , AVG(m.price) avg_price
     , n.color most_frequent
  FROM my_table m
  JOIN 
     ( SELECT x.product
            , x.color
         FROM 
            ( SELECT product
                   , color
                   , COUNT(color) total
                FROM my_table
               GROUP
                  BY product
                   , color
            ) x
         JOIN
            ( SELECT product
                   , MAX(total) max_total
                FROM 
                   ( SELECT product
                          , color
                          , COUNT(color) total
                       FROM my_table
                      GROUP
                         BY product
                          , color
                   ) a
               GROUP
                  BY product
           ) y
         ON y.product = x.product
        AND y.max_total = x.total
     ) n
    ON n.product = m.product
 GROUP
    BY m.product;

答案 2 :(得分:2)

这是一种方法。

Fiddle with sample data

select r.product, q.color, r.avgprice
from
(
select product, avg(price) as avgprice
from t
group by product
) r
join
(
select p.product, p.color
from
(
select product, color, count(*) as cnt
from t 
group by product, color
 ) p
join
    (
    select product, max(cnt) as maxcnt 
    from (
    select product, color, count(*) as cnt
    from t
    group by product, color) x
    group by product) y
on y.product = p.product and y.maxcnt = p.cnt
 ) q
on r.product = q.product