SQL - 连接表列中最常见的值

时间:2016-11-14 20:35:17

标签: mysql sql

我有三个表格如下:

Area (Id, Description)

City(Id, Name)

Problem(Id, City, Area, Definition):
 City references City (Id), Area references Area (Id)

我想找到每个城市(名称)的问题中出现的最常见的区域(描述)值。

示例:

Area
Id   Description
1      Support
2      Finance  

City
Id      Name
1      Chicago
2      Boston

Problem
Id  City  Area  Definition
1     1     2       A
2     1     2       B
3     1     1       C
4     2     1       D

期望的输出:

 Name         Description
 Chicago        Finance
 Boston         Support

这是我尝试过但没有成功的事情:

SELECT Name,
       Description
FROM
  (SELECT *
   FROM Problem AS P,
        City AS C,
        Area AS A
   WHERE C.Id = P.City
     AND A.Id = P.Area ) AS T1
WHERE Description =
    (SELECT Description
     FROM
       (SELECT *
        FROM Problem AS P,
             City AS C,
             Area AS A
        WHERE C.Id = P.City
          AND A.Id = P.Area ) AS T2
     WHERE T1.Name = T2.Name
     GROUP BY Description
     ORDER BY Count(Name) DESC LIMIT 1 )
GROUP BY Name,
         Description

谢谢!

2 个答案:

答案 0 :(得分:1)

每个城市和区域的最大值应为

  select  C.Name, A.Description from (
    select t1.City, t1.Area, max(freq)  as max_freq
    from (
        select P.City, P.Area, count(*) as Freq
        from Problem as P 
        group by P.City, P.Area
    ) t1
  ) t2 
  INNER JOIN City AS C ON t2.City = C.Id
  INNER JOIN Area AS A ON A.Id = t2.Area

答案 1 :(得分:1)

这可能是解决问题的最短途径:

select c.Name, a.Description
from City c
cross join Area a
where a.Id = (
    select p.Area
    from Problem p
    where p.City = c.Id
    group by p.Area
    order by count(*) desc, p.Area asc
    limit 1
)

我们使用CROSS JOIN将每个City与每个Area合并。但是我们只选择给定城市的Area表中具有最高计数的Problem,这是在相关子查询中确定的。如果两个区域的城市最高分数相同,那么将按字母顺序排在第一个区域(order by ... p.Area asc)。

结果:

|    Name | Description |
|---------|-------------|
|  Boston |     Support |
| Chicago |     Finance |

这是另一个更复杂的解决方案,其中包括计数。

select c.Name, a.Description, city_area_maxcount.mc as problem_count
from (
    select City, max(c) as mc
    from (
        select p.City, p.Area, count(*) as c
        from problem p
        group by p.City, p.Area
    ) city_area_count
    group by City
) city_area_maxcount
join (
    select p.City, p.Area, count(*) as c
    from problem p
    group by p.City, p.Area
) city_area_count
    on  city_area_count.City = city_area_maxcount.City
    and city_area_count.c = city_area_maxcount.mc
join City c on c.Id = city_area_count.City
join Area a on a.Id = city_area_count.Area

city_area_maxcount中使用的子查询在这里使用了两次(我希望mysql可以缓存结果)。如果您将其视为一个表,那么这将是一个常见的查找行与顶级值的每组问题。如果两个区域的城市最高分数相同,则两者都将被选中。

结果:

|    Name | Description | problem_count |
|---------|-------------|---------------|
|  Boston |     Support |             1 |
| Chicago |     Finance |             2 |

演示:http://sqlfiddle.com/#!9/c66a5/2