我的数据框如下所示:
head(temp$HName)
[1] "UNIVERSITY OF TEXAS HEALTH SCIENCE CENTER AT TYLER"
[2] "METHODIST HOSPITAL,THE"
[3] "TOMBALL REGIONAL MEDICAL CENTER"
[4] "METHODIST SUGAR LAND HOSPITAL"
[5] "GULF COAST MEDICAL CENTER"
[6] "VHS HARLINGEN HOSPITAL COMPANY LLC"
head(temp$Rate)
[1] 7.3 8.3 8.7 8.7 8.8 8.9
76 Levels: 7.3 8.3 8.7 8.8 8.9 9 9.1 9.2 9.3 9.4 9.5 9.6 ... 17.1
> head(temp$Rank)
[1] NA NA NA NA NA NA
temp$Rate
已排序。我正在尝试编写一个函数assignRank
,它为我提供了一个新列temp$Rank
,其值为1,2,3,3,4,5
我的代码如下:
tapply(temp$Rank,temp$Rate, assignRank)
其中:
assignRank<- function(r=1){
temp$Rank <- r
r <- r + 1
return(r)
}
运行tapply
tapply(temp$Rank,temp$Rate, assignRank)
Show Traceback
Rerun with Debug
Error in `$<-.data.frame`(`*tmp*`, "Rank", value = c(NA, NA)) :
replacement has 2 rows, data has 301
请告知我哪里出错了?
答案 0 :(得分:4)
我使用data.table
这样的东西,因为排序和排名都是非常有效/简单的语法
library(data.table)
setkey(setDT(temp), Rate) # This will sort your data set by Rate in case it's not yet sorted
temp[, Rank := .GRP, by = Rate]
temp
# HName Rate Rank
# 1: UNIVERSITY OF TEXAS HEALTH SCIENCE CENTER AT TYLER 7.3 1
# 2: METHODIST HOSPITAL,THE 8.3 2
# 3: TOMBALL REGIONAL MEDICAL CENTER 8.7 3
# 4: METHODIST SUGAR LAND HOSPITAL 8.7 3
# 5: GULF COAST MEDICAL CENTER 8.8 4
# 6: VHS HARLINGEN HOSPITAL COMPANY LLC 8.9 5
或者你可以使用基数R(假设你的数据按Rank排序)轻松做同样的事情
as.numeric(factor(temp$Rate))
## [1] 1 2 3 3 4 5
或者也可以使用dense_rank
包中的dplyr
函数(不需要对数据集进行排序)
library(dplyr)
temp %>%
mutate(Rank = dense_rank(Rate))
# HName Rate Rank
# 1 UNIVERSITY OF TEXAS HEALTH SCIENCE CENTER AT TYLER 7.3 1
# 2 METHODIST HOSPITAL,THE 8.3 2
# 3 TOMBALL REGIONAL MEDICAL CENTER 8.7 3
# 4 METHODIST SUGAR LAND HOSPITAL 8.7 3
# 5 GULF COAST MEDICAL CENTER 8.8 4
# 6 VHS HARLINGEN HOSPITAL COMPANY LLC 8.9 5
答案 1 :(得分:0)
其他选项(如果订购数据)
with(temp, cumsum(ave(Rate, Rate, FUN=function(x) c(1,x[-1]!=x[-length(x)]))))
#[1] 1 2 3 3 4 5
with(temp, match(Rate, unique(Rate)) )
#[1] 1 2 3 3 4 5