我正在尝试在MySQL中编写一个查询,该查询将输出最频繁出现的一对值。我有下表:
此表包含特定日期用户的音乐流活动。我想找出哪对艺术家在某一天最常出现。答案应该是(Pink Floyd,Queen),因为3位用户在同一天听了两位艺术家的讲话。我该如何实现?
我首先使用以下代码将表连接到自身上:
With temp as (
select person_id, artist_name, count(*) as times_played from users where date_played = '2020-10-01' group by 1,2)
select a.person_id, a.artist_name, b.artist_name from temp a join temp b
On a.person_id = b.person_id and a.artist_name != b. artist_name;
结果是following:
从现在开始我不确定如何处理,因此将不胜感激!
下面是在mySQL中创建表的代码
create table users
(
person_id int,
artist_name varchar(255),
date_played date
);
insert into users
(person_id, artist_name, date_played)
values
(1, 'Pink Floyd', '2020-10-01'),
(1, 'Led Zeppelin', '2020-10-01'),
(1, 'Queen', '2020-10-01'),
(1, 'Pink Floyd', '2020-10-01'),
(2, 'Journey', '2020-10-01'),
(2, 'Pink Floyd', '2020-10-01'),
(2, 'Queen', '2020-10-01'),
(2, 'Pink Floyd', '2020-10-01'),
(3, 'Pink Floyd', '2020-10-01'),
(3, 'Aerosmith', '2020-10-01'),
(3, 'Queen', '2020-10-01'),
(4, 'Pink Floyd', '2020-10-01'),
(4, 'Led Zeppelin', '2020-10-01');
答案 0 :(得分:0)
我们可以尝试使用自连接和RANK()
分析函数来处理此要求:
WITH cte AS (
SELECT
u1.artist_name AS artist1,
u2.artist_name AS artist2,
RANK() OVER (ORDER BY COUNT(*) DESC) rnk
FROM users u1
INNER JOIN users u2
ON u1.artist_name < u2.artist_name AND u1.person_id = u2.person_id
WHERE
u1.date_played = u2.date_played
GROUP BY
u1.artist_name,
u2.artist_name
)
SELECT
artist1,
artist2
FROM cte
WHERE rnk = 1;
答案 1 :(得分:0)
这是我在本帖(u1.artist_name < u2.artist_name
)中由Tim Biegeleisen提供的代码中发现的技巧所解决的问题:
With temp AS (
SELECT
person_id,
artist_name
FROM users
WHERE date_played = '2020-10-01'
GROUP BY 1,2
)
SELECT *
FROM (
SELECT
u1.artist_name AS artist1,
u2.artist_name AS artist2,
COUNT(*) AS times_played,
RANK() OVER (ORDER BY COUNT(*) DESC) Rnk
FROM temp u1
JOIN temp u2
ON u1.artist_name < u2.artist_name AND u1.person_id = u2.person_id
GROUP by 1,2
) sub
WHERE Rnk = 1;