如何在SQL中找到最频繁的配对?

时间:2020-02-05 03:59:24

标签: mysql data-manipulation

我正在尝试在MySQL中编写一个查询,该查询将输出最频繁出现的一对值。我有下表:

Original Dataset

此表包含特定日期用户的音乐流活动。我想找出哪对艺术家在某一天最常出现。答案应该是(Pink Floyd,Queen),因为3位用户在同一天听了两位艺术家的讲话。我该如何实现?

我首先使用以下代码将表连接到自身上:

With temp as (
select person_id, artist_name, count(*) as times_played from users where date_played = '2020-10-01' group by 1,2)
select a.person_id, a.artist_name, b.artist_name from temp a join temp b
On a.person_id = b.person_id and a.artist_name != b. artist_name;

结果是following

从现在开始我不确定如何处理,因此将不胜感激!

下面是在mySQL中创建表的代码

create table users
(
  person_id       int,
  artist_name     varchar(255),
  date_played     date
);

insert into users
  (person_id, artist_name, date_played)
values
  (1, 'Pink Floyd', '2020-10-01'),
  (1, 'Led Zeppelin', '2020-10-01'),
  (1, 'Queen', '2020-10-01'),
  (1, 'Pink Floyd', '2020-10-01'),
  (2, 'Journey', '2020-10-01'),
  (2, 'Pink Floyd', '2020-10-01'),
  (2, 'Queen', '2020-10-01'),
  (2, 'Pink Floyd', '2020-10-01'),
  (3, 'Pink Floyd', '2020-10-01'),
  (3, 'Aerosmith', '2020-10-01'),
  (3, 'Queen', '2020-10-01'),
  (4, 'Pink Floyd', '2020-10-01'),
  (4, 'Led Zeppelin', '2020-10-01');

2 个答案:

答案 0 :(得分:0)

我们可以尝试使用自连接和RANK()分析函数来处理此要求:

WITH cte AS (
    SELECT
        u1.artist_name AS artist1,
        u2.artist_name AS artist2,
        RANK() OVER (ORDER BY COUNT(*) DESC) rnk
    FROM users u1
    INNER JOIN users u2
        ON u1.artist_name < u2.artist_name AND u1.person_id = u2.person_id
    WHERE
        u1.date_played = u2.date_played
    GROUP BY
        u1.artist_name,
        u2.artist_name
)

SELECT
    artist1,
    artist2
FROM cte
WHERE rnk = 1;

答案 1 :(得分:0)

这是我在本帖(u1.artist_name < u2.artist_name)中由Tim Biegeleisen提供的代码中发现的技巧所解决的问题:

With temp AS (
    SELECT 
        person_id, 
        artist_name 
    FROM users 
    WHERE date_played = '2020-10-01' 
    GROUP BY 1,2
)
SELECT * 
FROM (

SELECT
    u1.artist_name AS artist1,
    u2.artist_name AS artist2,
    COUNT(*) AS times_played,
    RANK() OVER (ORDER BY COUNT(*) DESC) Rnk
FROM temp u1
JOIN temp u2
ON u1.artist_name < u2.artist_name AND u1.person_id = u2.person_id
GROUP by 1,2
) sub

WHERE Rnk = 1;