(R初学者级别,Win7上的R studio)
我有一个按州排名的数据框。我想将最高等级标记为“最佳”,将最高等级标记为“最差”但每个子集的成员数量不同,因此我必须计算每个状态的最大索引,然后更新col“level”。我可以做到“最好”,但无法识别“最差”,我不想使用循环:
mystate<- c(rep("TX",5),rep("AL",3),rep("NM",7))
mycounty<-c("TX1" ,"TX2", "TX3", "TX4", "TX5", "AL1", "AL2", "AL3", "NM1", "NM2", "NM3", "NM4", "NM5", "NM6", "NM7")
mycrime<-c(5,6,22,5,12,17,4,16,3,7,3,5,3,NA,16)
mydf<-data.frame(mystate,mycounty,mycrime)
mydf$rank<-NA
mydf <- transform(mydf,rank = ave(mycrime, mystate,FUN = function(x) rank(x, ties.method = "first")))
mydf$level <- NA
mydf[mydf$rank==1,"level"]<-"best"
# flag worst next
结果应如下所示:
mystate mycounty mycrime rank level
1 TX TX1 5 1 best
2 TX TX2 6 3 <NA>
3 TX TX3 22 5 worst
4 TX TX4 5 2 <NA>
5 TX TX5 12 4 <NA>
6 AL AL1 17 3 worst
7 AL AL2 4 1 best
8 AL AL3 16 2 <NA>
9 NM NM1 3 1 best
10 NM NM2 7 5 <NA>
11 NM NM3 3 2 <NA>
12 NM NM4 5 4 <NA>
13 NM NM5 3 3 <NA>
14 NM NM6 NA 7 <NA>
15 NM NM7 16 6 worst
感谢您的帮助。
答案 0 :(得分:1)
基础R 这是一种一次性获得“最差”和“最佳”的方法:
mydf <- data.frame(mystate, mycounty, mycrime)
z = ave(mydf$mycrime, mydf$mystate, FUN = function(x) {
r = rank(x, ties.method="first")
factor(r, levels = range(r))
})
mydf$level = factor(z, labels = c("best", "worst"))
ave
无法独立完成工作,因为它无法返回factor
(据我所知)。
dplyr 和 data.table 类似物
library(dplyr)
mydf %>% group_by(mystate) %>% mutate(
r = rank(x, ties.method="first"),
level = factor(r, levels = range(r), labels = c("best", "worst")),
r = NULL
)
# or...
library(data.table)
setDT(mydf)[, level := {
r = rank(x, ties.method="first")
factor(r, levels = range(r), labels = c("best", "worst"))
}, by=mystate]
答案 1 :(得分:0)
1)没有软件包使用ave
计算0/1向量,其中最差为1,否则为0,然后使用ifelse
设置{的值{1}}:
level
2)dplyr 使用上面的dplyr和is.max <- function(x) seq_along(x) == which.max(x)
worst <- with(mydf, ave(mycrime, mystate, FUN = is.max))
transform(mydf, level = ifelse(worst, "worst", level))
giving;
mystate mycounty mycrime rank level
1 TX TX1 5 1 best
2 TX TX2 6 3 <NA>
3 TX TX3 22 5 worst
4 TX TX4 5 2 <NA>
5 TX TX5 12 4 <NA>
6 AL AL1 17 3 worst
7 AL AL2 4 1 best
8 AL AL3 16 2 <NA>
9 NM NM1 3 1 best
10 NM NM2 7 5 <NA>
11 NM NM3 3 2 <NA>
12 NM NM4 5 4 <NA>
13 NM NM5 3 3 <NA>
14 NM NM6 NA 7 <NA>
15 NM NM7 16 6 worst
可以这样做:
is.max
3)data.table 使用上面的data.table和library(dplyr)
mydf %>%
group_by(mystate) %>%
mutate(level = ifelse(is.max(mycrime), "worst", level)
:
is.max