以下代码生成2级组(按测试中的状态),然后根据Grade的升序对每组中的每个观察进行排名。学校是打破平局。
School<-rep(c("A","B","C","D"),each=10)
State<-rep(c("NY","NJ"),times=20)
Test<-rep(c("LSAT", "MCAT", "GRE","TOEFL","ACT"), times=8)
Grade<-trunc(rep((seq(from=500, to=600,length.out=4))))
dat<-data.frame(Test,State,School,Grade)
library(plyr)
dat<-ddply(dat, .(Test, State),transform,num=rank(Grade,ties.method="first"))
我使用以下代码将每个组中排名第一的项目转换为“最低”:
dat$num[dat$num==1]<-"lowest"
在此示例df中,每组的项目数始终为4,因此我可以使用以下代码将每组中排名最高的项目转换为“最高”:
dat$num[dat$num==4]<-"highest"
但是当行数在所有组中不恒定时,如何用“最高”标记观察?以下代码创建了一个df版本,其中一个组中有两个额外的行。
School<-rep(c("A","B","C","D"),each=10)
State<-rep(c("NY","NJ"),times=20)
Test<-rep(c("LSAT", "MCAT", "GRE","TOEFL","ACT"), times=8)
Grade<-trunc(rep((seq(from=500, to=600,length.out=4))))
dat1<-data.frame(Test,State,School,Grade)
dat1<-rbind(dat1,
data.frame(Test="ACT",State="NJ",School="E",Grade=550),
data.frame(Test="ACT",State="NJ",School="F",Grade=650))
library(plyr)
dat1<-ddply(dat1, .(Test, State),transform,num=rank(Grade,ties.method="first"))
答案 0 :(得分:2)
您可以通过检查每个组中哪个是最高/最低并为这些行分配最高/最低来实现。在这里,我使用ddply
来执行此操作,因为您已在代码中使用plyr
:
dat1 <- ddply(dat1, .(Test, State), transform, num=ifelse(num == max(num), "highest",
ifelse(num == min(num), "lowest", num)))
> dat1
Test State School Grade num
1 ACT NJ A 533 lowest
2 ACT NJ B 600 4
3 ACT NJ C 533 2
4 ACT NJ D 600 5
5 ACT NJ E 550 3
6 ACT NJ F 650 highest
7 ACT NY A 500 lowest
8 ACT NY B 566 3
9 ACT NY C 500 2
10 ACT NY D 566 highest
11 GRE NJ A 600 3
12 GRE NJ B 533 lowest
13 GRE NJ C 600 highest
14 GRE NJ D 533 2
15 GRE NY A 566 3
16 GRE NY B 500 lowest
17 GRE NY C 566 highest
18 GRE NY D 500 2
19 LSAT NJ A 533 lowest
20 LSAT NJ B 600 3
21 LSAT NJ C 533 2
22 LSAT NJ D 600 highest
23 LSAT NY A 500 lowest
24 LSAT NY B 566 3
25 LSAT NY C 500 2
26 LSAT NY D 566 highest
27 MCAT NJ A 533 lowest
28 MCAT NJ B 600 3
29 MCAT NJ C 533 2
30 MCAT NJ D 600 highest
31 MCAT NY A 566 3
32 MCAT NY B 500 lowest
33 MCAT NY C 566 highest
34 MCAT NY D 500 2
35 TOEFL NJ A 600 3
36 TOEFL NJ B 533 lowest
37 TOEFL NJ C 600 highest
38 TOEFL NJ D 533 2
39 TOEFL NY A 500 lowest
40 TOEFL NY B 566 3
41 TOEFL NY C 500 2
42 TOEFL NY D 566 highest
如果您的数据足够大,您还可以考虑使用dplyr
或data.table
,这将比plyr
更快。
答案 1 :(得分:0)
将dplyr
与cut
library(dplyr)
dat1%>%
group_by(Test, State) %>%
mutate(num=rank(Grade, ties.method="first"),
Categ= cut(num, breaks=c(-Inf, min(num), max(num)-1, Inf), labels=c("lowest", "medium", "highest")))%>%
arrange(Test,State,num)
#Source: local data frame [42 x 6]
#Groups: Test, State
# Test State School Grade num Categ
#1 ACT NJ A 533 1 lowest
#2 ACT NJ C 533 2 medium
#3 ACT NJ E 550 3 medium
#4 ACT NJ B 600 4 medium
#5 ACT NJ D 600 5 medium
#6 ACT NJ F 650 6 highest
#7 ACT NY A 500 1 lowest
#8 ACT NY C 500 2 medium
#9 ACT NY B 566 3 medium
#10 ACT NY D 566 4 highest
#11 GRE NJ B 533 1 lowest
#12 GRE NJ D 533 2 medium
#13 GRE NJ A 600 3 medium
#14 GRE NJ C 600 4 highest
#15 GRE NY B 500 1 lowest
#16 GRE NY D 500 2 medium
#17 GRE NY A 566 3 medium
#18 GRE NY C 566 4 highest
#19 LSAT NJ A 533 1 lowest
#20 LSAT NJ C 533 2 medium
#21 LSAT NJ B 600 3 medium
#22 LSAT NJ D 600 4 highest
#23 LSAT NY A 500 1 lowest
#24 LSAT NY C 500 2 medium
#25 LSAT NY B 566 3 medium
#26 LSAT NY D 566 4 highest
#27 MCAT NJ A 533 1 lowest
#28 MCAT NJ C 533 2 medium
#29 MCAT NJ B 600 3 medium
#30 MCAT NJ D 600 4 highest
#31 MCAT NY B 500 1 lowest
#32 MCAT NY D 500 2 medium
#33 MCAT NY A 566 3 medium
#34 MCAT NY C 566 4 highest
#35 TOEFL NJ B 533 1 lowest
#36 TOEFL NJ D 533 2 medium
#37 TOEFL NJ A 600 3 medium
#38 TOEFL NJ C 600 4 highest
#39 TOEFL NY A 500 1 lowest
#40 TOEFL NY C 500 2 medium
#41 TOEFL NY B 566 3 medium
#42 TOEFL NY D 566 4 highest
答案 2 :(得分:0)
这是一个data.table
解决方案:
setDT(dat1)
idx = dat1[, .I[c(which.min(num), which.max(num))], by="Test,State"]$V1
dat1[, num := as.character(num)][idx, num := c("lowest", "highest")]
# Test State School Grade num
# 1: ACT NJ A 533 lowest
# 2: ACT NJ B 600 4
# 3: ACT NJ C 533 2
# 4: ACT NJ D 600 5
# 5: ACT NJ E 550 3
# 6: ACT NJ F 650 highest
# 7: ACT NY A 500 lowest
# 8: ACT NY B 566 3
# ...
dat1
转换为data.table。Test,State
中的每个论坛,获取与dat1
对应的每个最小值和最大值的行号,并将其存储在idx
。num
转换为字符类型,然后使用idx
对行进行分组,并使用num
和lowest
更改highest
的值R&#39> 回收功能。请注意,如果某个群组只有一个值,则该值应为最小值和最大值,在这种情况下,此解决方案会为您提供highest
(lowest
将被替换。)
答案 3 :(得分:0)
这是一个与原始答案基本相同的无包版本,使用ave()而不是4进行额外修正。这在提供的短数据集上更快,但可能不在更大的设置上。
# mark lowest
dat1[dat1$num == 1,'num'] <- 'lowest'
# mark highest
dat1[dat1$num == ave(x = dat1$num,list(dat1$Test,dat1$State),FUN = max),'num'] <- 'highest'