我有一个像这样的数据模式/ csv(cols:id ...得分)超过90000行,我需要按照显示的等级 - 按年份分组,然后按类分组。可以在MYSQL或R
中帮助解决这个问题id Year class name score rank 1 2010 Phy joe 95 2 2 2010 Phy amy 98 1 3 2010 Phy carl 58 3 4 2010 Mat joe 88 3 5 2010 Mat amy 100 1 6 2010 Mat carl 95 2 7 2011 Phy joe 84 1 8 2011 Phy amy 25 3 9 2011 Phy carl 48 2 10 2011 Mat joe 56 2 11 2011 Mat amy 85 1 12 2011 Mat carl 48 3
答案 0 :(得分:1)
假设您的数据作为名为data.frame
的{{1}}存储在R中,那么您可以在将其定义为
dd
请注意,dd$ranks<-with(dd, ave(score, Year, class, FUN=function(x) rank(-x)))
有几种关系选项,因此您可能需要阅读rank
以查看适合您的选项。
答案 1 :(得分:0)
考虑以下内容......
SET NAMES utf8;
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,Year INT NOT NULL
,class VARCHAR(12) NOT NULL
,name VARCHAR(12) NOT NULL
,score INT
);
INSERT INTO my_table VALUES
(1 ,2010 ,'Phy','joe',95),
(2 ,2010 ,'Phy','amy', 98 ),
(3 ,2010 ,'Phy','carl', 58 ),
(4 ,2010 ,'Mat','joe', 88 ),
(5 ,2010 ,'Mat','amy', 100 ),
(6 ,2010 ,'Mat','carl', 95 ),
(7 ,2011 ,'Phy','joe', 84 ),
(8 ,2011 ,'Phy','amy', 25 ),
(9 ,2011 ,'Phy','carl', 48 ),
(10 ,2011 ,'Mat','joe', 56 ),
(11 ,2011 ,'Mat','amy', 85 ),
(12 ,2011 ,'Mat','carl', 48 );
1)
SELECT x.*
, COUNT(*) rank
FROM my_table x
JOIN my_table y
ON y.year = x.year
AND y.class=x.class
AND y.score >= x.score
GROUP
BY x.year
, x.class
, x.score
ORDER
BY ID;
+----+------+-------+------+-------+------+
| id | Year | class | name | score | rank |
+----+------+-------+------+-------+------+
| 1 | 2010 | Phy | joe | 95 | 2 |
| 2 | 2010 | Phy | amy | 98 | 1 |
| 3 | 2010 | Phy | carl | 58 | 3 |
| 4 | 2010 | Mat | joe | 88 | 3 |
| 5 | 2010 | Mat | amy | 100 | 1 |
| 6 | 2010 | Mat | carl | 95 | 2 |
| 7 | 2011 | Phy | joe | 84 | 1 |
| 8 | 2011 | Phy | amy | 25 | 3 |
| 9 | 2011 | Phy | carl | 48 | 2 |
| 10 | 2011 | Mat | joe | 56 | 2 |
| 11 | 2011 | Mat | amy | 85 | 1 |
| 12 | 2011 | Mat | carl | 48 | 3 |
+----+------+-------+------+-------+------+
2)。
SELECT id
, year
, class
, name
, score
, rank
FROM
( SELECT x.*
, IF(@pclass = class,IF(@pyear=year,@i:=@i+1,@i:=1),@i:=1)rank
, @pyear := year
, @pclass := class
FROM my_table x
, (SELECT @pyear:='',@pclass:='',@i:=1)vals
ORDER
BY year,class,score DESC
) m
ORDER
BY id;
+----+------+-------+------+-------+------+
| id | year | class | name | score | rank |
+----+------+-------+------+-------+------+
| 1 | 2010 | Phy | joe | 95 | 2 |
| 2 | 2010 | Phy | amy | 98 | 1 |
| 3 | 2010 | Phy | carl | 58 | 3 |
| 4 | 2010 | Mat | joe | 88 | 3 |
| 5 | 2010 | Mat | amy | 100 | 1 |
| 6 | 2010 | Mat | carl | 95 | 2 |
| 7 | 2011 | Phy | joe | 84 | 1 |
| 8 | 2011 | Phy | amy | 25 | 3 |
| 9 | 2011 | Phy | carl | 48 | 2 |
| 10 | 2011 | Mat | joe | 56 | 2 |
| 11 | 2011 | Mat | amy | 85 | 1 |
| 12 | 2011 | Mat | carl | 48 | 3 |
+----+------+-------+------+-------+------+
建议2可能比建议1快几个数量级