我正在尝试按每个州的最低费率对我的医院名称进行排名。 当多家医院的费率相同时,应通过使用医院名称并按字母顺序对其进行排序来打破平局。到目前为止,我已经设法按照医院名称对州内的费率进行排名,但我无法弄清楚如何打破关系并对其进行排名而不跳过数字
这是我到目前为止使用以下代码所得到的:
outcome_data <- read.csv("outcome-of-care-measures.csv", na.strings="Not Available" ,stringsAsFactors=FALSE) #Read csv file
myData = outcome_data[,c(2, 7, 11)] #Retrieve only Hosp name, state and heart attack rate
arr1<-myData[complete.cases(myData[,3]),] ##Remove NAs
arr2 <- arr1[order(arr1[2], arr1[3], arr1[1]),] #sort by state, then rate and then hospital name
arr3<-transform(arr2, rank = ave(rate, State, FUN = function(x) rank(x, ties.method = "min"))) #Rank by rate within each state
我目前获得的输出是:
Hospital.Name State rate rank
SOUTH PENINSULA HOSPITAL AK 10.8 1
YUKON KUSKOKWIM DELTA REG HOSPITAL AK 11.2 2
MAT-SU REGIONAL MEDICAL CENTER AK 11.4 3
PEACEHEALTH KETCHIKAN MEDICAL CENTER AK 11.4 3
ALASKA NATIVE MEDICAL CENTER AK 11.6 5
BARTLETT REGIONAL HOSPITAL AK 11.6 5
CENTRAL PENINSULA GENERAL HOSPITAL AK 11.6 5
PROVIDENCE ALASKA MEDICAL CENTER AK 12.4 8
ALASKA REGIONAL HOSPITAL AK 13.4 9
FAIRBANKS MEMORIAL HOSPITAL AK 15.6 10
GEORGE H. LANIER MEMORIAL HOSPITAL AL 8.8 1
EVERGREEN MEDICAL CENTER AL 9.1 2
BAPTIST MEDICAL CENTER EAST AL 9.6 3
LAWRENCE MEDICAL CENTER AL 9.9 4
ANDALUSIA REGIONAL HOSPITAL AL 10.1 5
JACKSON HOSPITAL & CLINIC INC AL 10.2 6
BIRMINGHAM VA MEDICAL CENTER AL 10.4 7
FLORALA MEMORIAL HOSPITAL AL 10.4 7
GROVE HILL MEMORIAL HOSPITAL AL 10.4 7
SPRINGHILL MEDICAL CENTER AL 10.4 7
WEDOWEE HOSPITAL AL 10.4 7
PARKWAY MEDICAL CENTER AL 10.5 12
ST VINCENT'S BIRMINGHAM AL 10.6 13
WIREGRASS MEDICAL CENTER AL 10.6 13
GADSDEN REGIONAL MEDICAL CENTER AL 10.7 15
HALE COUNTY HOSPITAL AL 10.7 15
MOBILE INFIRMARY AL 10.7 15
但我想要的是
Hospital.Name State rate rank
SOUTH PENINSULA HOSPITAL AK 10.8 1
YUKON KUSKOKWIM DELTA REG HOSPITAL AK 11.2 2
MAT-SU REGIONAL MEDICAL CENTER AK 11.4 3
PEACEHEALTH KETCHIKAN MEDICAL CENTER AK 11.4 4
ALASKA NATIVE MEDICAL CENTER AK 11.6 5
BARTLETT REGIONAL HOSPITAL AK 11.6 6
CENTRAL PENINSULA GENERAL HOSPITAL AK 11.6 7
PROVIDENCE ALASKA MEDICAL CENTER AK 12.4 8
ALASKA REGIONAL HOSPITAL AK 13.4 9
FAIRBANKS MEMORIAL HOSPITAL AK 15.6 10
GEORGE H. LANIER MEMORIAL HOSPITAL AL 8.8 1
EVERGREEN MEDICAL CENTER AL 9.1 2
BAPTIST MEDICAL CENTER EAST AL 9.6 3
LAWRENCE MEDICAL CENTER AL 9.9 4
ANDALUSIA REGIONAL HOSPITAL AL 10.1 5
JACKSON HOSPITAL & CLINIC INC AL 10.2 6
BIRMINGHAM VA MEDICAL CENTER AL 10.4 7
FLORALA MEMORIAL HOSPITAL AL 10.4 8
GROVE HILL MEMORIAL HOSPITAL AL 10.4 9
SPRINGHILL MEDICAL CENTER AL 10.4 10
WEDOWEE HOSPITAL AL 10.4 11
PARKWAY MEDICAL CENTER AL 10.5 12
ST VINCENT'S BIRMINGHAM AL 10.6 13
WIREGRASS MEDICAL CENTER AL 10.6 14
GADSDEN REGIONAL MEDICAL CENTER AL 10.7 15
HALE COUNTY HOSPITAL AL 10.7 16
MOBILE INFIRMARY AL 10.7 17
有什么想法吗?
答案 0 :(得分:1)
我们需要在order
步骤
library(dplyr)
arr2 %>%
group_by(State) %>%
mutate(rank = row_number())
或者,如果我们从&#39; arr1&#39;
开始arr1 %>%
arrange(State, rate, Hospital.Name) %>%
group_by(State) %>%
mutate(rank = row_number())
或使用ave
base R
with(arr2, ave(seq_along(State), State, FUN = seq_along))
#[1] 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
答案 1 :(得分:0)
使用data.table
这是相对简单的:
library(data.table)
# Read only relevant columns from csv file using data.table::fread
outcome_data <- fread("outcome-of-care-measures.csv",
na.strings="Not Available" ,
select = c("Hospital.Name","State","rate"))
# Drop rows NA values using data.table::na.omit
outcome_data <- na.omit(outcome_data)
## Use data.table::setkey to sort/index by State, then rate, then hospital name
setkey(outcome_data,State,rate,Hospital.Name)
## Add a rank column by state, order within groups will be based key order above
## (the .N operator is the number of rows in each State group)
outcome_data[,rank := seq_len(.N),by = .(State)]