我有一个查询来获取评论特定关键字的前'n'用户
SELECT `user` , COUNT( * ) AS magnitude
FROM `results`
WHERE `keyword` = "economy"
GROUP BY `user`
ORDER BY magnitude DESC
LIMIT 5
我有大约6000个关键字,并希望运行此查询以获得我们拥有数据的每个关键字的顶级'n'用户。援助表示赞赏。
答案 0 :(得分:3)
由于您没有给出results
的架构,我会假设它是这个或非常相似(可能是额外的列):
create table results (
id int primary key,
user int,
foreign key (user) references <some_other_table>(id),
keyword varchar(<30>)
);
第1步:按照示例查询中的keyword/user
汇总,但对于所有关键字:
create view user_keyword as (
select
keyword,
user,
count(*) as magnitude
from results
group by keyword, user
);
第2步:为每个关键字组中的每个用户排名(请注意使用子查询对行进行排名):
create view keyword_user_ranked as (
select
keyword,
user,
magnitude,
(select count(*)
from user_keyword
where l.keyword = keyword and magnitude >= l.magnitude
) as rank
from
user_keyword l
);
第3步:仅选择排名小于某个数字的行:
select *
from keyword_user_ranked
where rank <= 3;
示例:
使用的基础数据:
mysql> select * from results;
+----+------+---------+
| id | user | keyword |
+----+------+---------+
| 1 | 1 | mysql |
| 2 | 1 | mysql |
| 3 | 2 | mysql |
| 4 | 1 | query |
| 5 | 2 | query |
| 6 | 2 | query |
| 7 | 2 | query |
| 8 | 1 | table |
| 9 | 2 | table |
| 10 | 1 | table |
| 11 | 3 | table |
| 12 | 3 | mysql |
| 13 | 3 | query |
| 14 | 2 | mysql |
| 15 | 1 | mysql |
| 16 | 1 | mysql |
| 17 | 3 | query |
| 18 | 4 | mysql |
| 19 | 4 | mysql |
| 20 | 5 | mysql |
+----+------+---------+
按关键字和用户分组:
mysql> select * from user_keyword order by keyword, magnitude desc;
+---------+------+-----------+
| keyword | user | magnitude |
+---------+------+-----------+
| mysql | 1 | 4 |
| mysql | 2 | 2 |
| mysql | 4 | 2 |
| mysql | 3 | 1 |
| mysql | 5 | 1 |
| query | 2 | 3 |
| query | 3 | 2 |
| query | 1 | 1 |
| table | 1 | 2 |
| table | 2 | 1 |
| table | 3 | 1 |
+---------+------+-----------+
在关键字中排名的用户:
mysql> select * from keyword_user_ranked order by keyword, rank asc;
+---------+------+-----------+------+
| keyword | user | magnitude | rank |
+---------+------+-----------+------+
| mysql | 1 | 4 | 1 |
| mysql | 2 | 2 | 3 |
| mysql | 4 | 2 | 3 |
| mysql | 3 | 1 | 5 |
| mysql | 5 | 1 | 5 |
| query | 2 | 3 | 1 |
| query | 3 | 2 | 2 |
| query | 1 | 1 | 3 |
| table | 1 | 2 | 1 |
| table | 3 | 1 | 3 |
| table | 2 | 1 | 3 |
+---------+------+-----------+------+
每个关键字只有前2位:
mysql> select * from keyword_user_ranked where rank <= 2 order by keyword, rank asc;
+---------+------+-----------+------+
| keyword | user | magnitude | rank |
+---------+------+-----------+------+
| mysql | 1 | 4 | 1 |
| query | 2 | 3 | 1 |
| query | 3 | 2 | 2 |
| table | 1 | 2 | 1 |
+---------+------+-----------+------+
请注意,当存在关联时 - 请参阅示例中的关键字“mysql”的用户2和4 - 关系中的所有方都获得“最后”排名,即如果第2和第3级被绑定,则两者都被分配排名3。
性能:为关键字和用户列添加索引会有所帮助。我有一个以类似方式查询的表,两列有4000和1300个不同的值(在600000行表中)。您可以像这样添加索引:
alter table results add index keyword_user (keyword, user);
就我而言,查询时间从大约6秒减少到大约2秒。
答案 1 :(得分:0)
您可以使用这样的模式(来自Within-group quotas (Top N per group)):
SELECT tmp.ID, tmp.entrydate
FROM (
SELECT
ID, entrydate,
IF( @prev <> ID, @rownum := 1, @rownum := @rownum+1 ) AS rank,
@prev := ID
FROM test t
JOIN (SELECT @rownum := NULL, @prev := 0) AS r
ORDER BY t.ID
) AS tmp
WHERE tmp.rank <= 2
ORDER BY ID, entrydate;
+------+------------+
| ID | entrydate |
+------+------------+
| 1 | 2007-05-01 |
| 1 | 2007-05-02 |
| 2 | 2007-06-03 |
| 2 | 2007-06-04 |
| 3 | 2007-07-01 |
| 3 | 2007-07-02 |
+------+------------+