R - 仅为顶部/底部排名行添加新列中的标志

时间:2015-11-12 00:37:23

标签: r dataframe ranking

(R初学者级别,Win7上的R studio)

我有一个按州排名的数据框。我想将最高等级标记为“最佳”,将最高等级标记为“最差”但每个子集的成员数量不同,因此我必须计算每个状态的最大索引,然后更新col“level”。我可以做到“最好”,但无法识别“最差”,我不想使用循环:

mystate<- c(rep("TX",5),rep("AL",3),rep("NM",7))
mycounty<-c("TX1" ,"TX2", "TX3", "TX4", "TX5", "AL1", "AL2", "AL3", "NM1", "NM2", "NM3", "NM4", "NM5", "NM6", "NM7")
mycrime<-c(5,6,22,5,12,17,4,16,3,7,3,5,3,NA,16)
mydf<-data.frame(mystate,mycounty,mycrime)
mydf$rank<-NA
mydf <- transform(mydf,rank = ave(mycrime, mystate,FUN = function(x) rank(x, ties.method = "first")))
mydf$level <- NA
mydf[mydf$rank==1,"level"]<-"best"
# flag worst next

结果应如下所示:

    mystate mycounty mycrime rank level
 1       TX      TX1       5    1  best
 2       TX      TX2       6    3  <NA>
 3       TX      TX3      22    5  worst
 4       TX      TX4       5    2  <NA>
 5       TX      TX5      12    4  <NA>
 6       AL      AL1      17    3  worst
 7       AL      AL2       4    1  best
 8       AL      AL3      16    2  <NA>
 9       NM      NM1       3    1  best
 10      NM      NM2       7    5  <NA>
 11      NM      NM3       3    2  <NA>
 12      NM      NM4       5    4  <NA>
 13      NM      NM5       3    3  <NA>
 14      NM      NM6      NA    7  <NA>
 15      NM      NM7      16    6  worst 

感谢您的帮助。

2 个答案:

答案 0 :(得分:1)

基础R 这是一种一次性获得“最差”和“最佳”的方法:

mydf <- data.frame(mystate, mycounty, mycrime)

z = ave(mydf$mycrime, mydf$mystate, FUN = function(x) {
  r = rank(x, ties.method="first")
  factor(r, levels = range(r))
})

mydf$level = factor(z, labels = c("best", "worst"))

ave无法独立完成工作,因为它无法返回factor(据我所知)。

dplyr data.table 类似物

library(dplyr)
mydf %>% group_by(mystate) %>% mutate(
  r     = rank(x, ties.method="first"),
  level = factor(r, levels = range(r), labels = c("best", "worst")),
  r     = NULL
)

# or...
library(data.table)
setDT(mydf)[, level := {
  r = rank(x, ties.method="first")
  factor(r, levels = range(r), labels = c("best", "worst"))
}, by=mystate]

答案 1 :(得分:0)

1)没有软件包使用ave计算0/1向量,其中最差为1,否则为0,然后使用ifelse设置{的值{1}}:

level

2)dplyr 使用上面的dplyr和is.max <- function(x) seq_along(x) == which.max(x) worst <- with(mydf, ave(mycrime, mystate, FUN = is.max)) transform(mydf, level = ifelse(worst, "worst", level)) giving; mystate mycounty mycrime rank level 1 TX TX1 5 1 best 2 TX TX2 6 3 <NA> 3 TX TX3 22 5 worst 4 TX TX4 5 2 <NA> 5 TX TX5 12 4 <NA> 6 AL AL1 17 3 worst 7 AL AL2 4 1 best 8 AL AL3 16 2 <NA> 9 NM NM1 3 1 best 10 NM NM2 7 5 <NA> 11 NM NM3 3 2 <NA> 12 NM NM4 5 4 <NA> 13 NM NM5 3 3 <NA> 14 NM NM6 NA 7 <NA> 15 NM NM7 16 6 worst 可以这样做:

is.max

3)data.table 使用上面的data.table和library(dplyr) mydf %>% group_by(mystate) %>% mutate(level = ifelse(is.max(mycrime), "worst", level)

is.max