每个关键字的最高'n'结果

时间:2012-03-13 20:58:06

标签: mysql

我有一个查询来获取评论特定关键字的前'n'用户

SELECT `user` , COUNT( * ) AS magnitude
FROM `results`
WHERE `keyword` = "economy"
GROUP BY `user`
ORDER BY magnitude DESC
LIMIT 5 

我有大约6000个关键字,并希望运行此查询以获得我们拥有数据的每个关键字的顶级'n'用户。援助表示赞赏。

2 个答案:

答案 0 :(得分:3)

由于您没有给出results的架构,我会假设它是这个或非常相似(可能是额外的列):

create table results (
  id int primary key,
  user int,
    foreign key (user) references <some_other_table>(id),
  keyword varchar(<30>)
);

第1步:按照示例查询中的keyword/user汇总,但对于所有关键字:

create view user_keyword as (
  select
    keyword,
    user,
    count(*) as magnitude
  from results
  group by keyword, user
);

第2步:为每个关键字组中的每个用户排名(请注意使用子查询对行进行排名):

create view keyword_user_ranked as (
  select 
    keyword,
    user,
    magnitude,
    (select count(*) 
     from user_keyword 
     where l.keyword = keyword and magnitude >= l.magnitude
    ) as rank
  from
    user_keyword l
);

第3步:仅选择排名小于某个数字的行:

select * 
from keyword_user_ranked 
where rank <= 3;

示例:

使用的基础数据:

mysql> select * from results;
+----+------+---------+
| id | user | keyword |
+----+------+---------+
|  1 |    1 | mysql   |
|  2 |    1 | mysql   |
|  3 |    2 | mysql   |
|  4 |    1 | query   |
|  5 |    2 | query   |
|  6 |    2 | query   |
|  7 |    2 | query   |
|  8 |    1 | table   |
|  9 |    2 | table   |
| 10 |    1 | table   |
| 11 |    3 | table   |
| 12 |    3 | mysql   |
| 13 |    3 | query   |
| 14 |    2 | mysql   |
| 15 |    1 | mysql   |
| 16 |    1 | mysql   |
| 17 |    3 | query   |
| 18 |    4 | mysql   |
| 19 |    4 | mysql   |
| 20 |    5 | mysql   |
+----+------+---------+

按关键字和用户分组:

mysql> select * from user_keyword order by keyword, magnitude desc;
+---------+------+-----------+
| keyword | user | magnitude |
+---------+------+-----------+
| mysql   |    1 |         4 |
| mysql   |    2 |         2 |
| mysql   |    4 |         2 |
| mysql   |    3 |         1 |
| mysql   |    5 |         1 |
| query   |    2 |         3 |
| query   |    3 |         2 |
| query   |    1 |         1 |
| table   |    1 |         2 |
| table   |    2 |         1 |
| table   |    3 |         1 |
+---------+------+-----------+

在关键字中排名的用户:

mysql> select * from keyword_user_ranked order by keyword, rank asc;
+---------+------+-----------+------+
| keyword | user | magnitude | rank |
+---------+------+-----------+------+
| mysql   |    1 |         4 |    1 |
| mysql   |    2 |         2 |    3 |
| mysql   |    4 |         2 |    3 |
| mysql   |    3 |         1 |    5 |
| mysql   |    5 |         1 |    5 |
| query   |    2 |         3 |    1 |
| query   |    3 |         2 |    2 |
| query   |    1 |         1 |    3 |
| table   |    1 |         2 |    1 |
| table   |    3 |         1 |    3 |
| table   |    2 |         1 |    3 |
+---------+------+-----------+------+

每个关键字只有前2位:

mysql> select * from keyword_user_ranked where rank <= 2 order by keyword, rank asc;
+---------+------+-----------+------+
| keyword | user | magnitude | rank |
+---------+------+-----------+------+
| mysql   |    1 |         4 |    1 |
| query   |    2 |         3 |    1 |
| query   |    3 |         2 |    2 |
| table   |    1 |         2 |    1 |
+---------+------+-----------+------+

请注意,当存在关联时 - 请参阅示例中的关键字“mysql”的用户2和4 - 关系中的所有方都获得“最后”排名,即如果第2和第3级被绑定,则两者都被分配排名3。


性能:为关键字和用户列添加索引会有所帮助。我有一个以类似方式查询的表,两列有4000和1300个不同的值(在600000行表中)。您可以像这样添加索引:

alter table results add index keyword_user (keyword, user);

就我而言,查询时间从大约6秒减少到大约2秒。

答案 1 :(得分:0)

您可以使用这样的模式(来自Within-group quotas (Top N per group)):

SELECT tmp.ID, tmp.entrydate 
FROM ( 
  SELECT 
    ID, entrydate, 
    IF( @prev <> ID, @rownum := 1, @rownum := @rownum+1 ) AS rank, 
    @prev := ID 
  FROM test t 
  JOIN (SELECT @rownum := NULL, @prev := 0) AS r 
  ORDER BY t.ID 
) AS tmp 
WHERE tmp.rank <= 2 
ORDER BY ID, entrydate; 
+------+------------+ 
| ID   | entrydate  | 
+------+------------+ 
|    1 | 2007-05-01 | 
|    1 | 2007-05-02 | 
|    2 | 2007-06-03 | 
|    2 | 2007-06-04 | 
|    3 | 2007-07-01 | 
|    3 | 2007-07-02 | 
+------+------------+