我现在正在Coursera上做R编程入门课程,我的代码中有以下疑问。以下是我的代码。
rankall <- function(outcome, num = "best") {
## Read outcome data
dat <- read.csv("outcome-of-care-measures.csv")
## Check that outcome are valid
outcomeValues <- c("heart attack", "heart failure", "pneumonia")
if(!(outcome %in% outcomeValues)){
stop("invalid outcome")
}
column <- if(outcome == "heart attack"){
11
}
else if(outcome == "heart failure"){
17
}
else if(outcome == "pneumonia") {
23
}
dat[, column] <- suppressWarnings(as.numeric(levels(dat[, column])[dat[, column]]))
dat[, 2] <- as.character(dat[, 2])
dat[, 11] <- as.numeric(dat[, 11]) # heart attack
dat[, 17] <- as.numeric(dat[, 17]) # heart failure
dat[, 23] <- as.numeric(dat[, 23]) # pneumonia
output <- vector()
states <- levels(dat[, 7])
## Return hospital name in that state with lowest 30-day death rate
## For each state, find the hospital of the given rank
for( i in 1:length(states)){
stateData <- dat[grep(states[i], dat$State), ]
outcomeData <- stateData[order(stateData[, column], stateData[, 2], na.last = NA), ]
hospital <- if(num == "best" || num == 1){
outcomeData[1, 2]
}
else if(num == "worst") {
outcomeData[nrow(outcomeData), 2]
}
else {
outcomeData[num, 2]
}
result <- append(result, c(hospital, states[i]))
}
## Return a data frame with the hospital names and the (abbreviated) state name
result <- as.data.frame(matrix(result, nrow = length(states), ncol = 2, byrow = TRUE))
colnames(result) <- c("Hospital names", "State")
result
}
在上面的代码中,如果我替换
states <- levels(dat[, 7])
states <- unique(dat[,7])
,我没有得到正确的输出。我不明白为什么。
而且,如果我取下
dat[, column] <- suppressWarnings(as.numeric(levels(dat[, column])[dat[, column]]))
我的代码没有生成正确的输出。我试过提取抑制警告文件,但要么我错过了某个地方,要么我无法在任何地方找到正确的答案。
答案 0 :(得分:2)
因为因子列可能具有比实际值更多的级别,例如:
x <- factor(1:3, levels = 1:4)
x
# [1] 1 2 3
# Levels: 1 2 3 4
unique(x)
# [1] 1 2 3
# Levels: 1 2 3 4
length(unique(x))
# [1] 3
levels(x)
# [1] "1" "2" "3" "4"
length(levels(x))
# [1] 4
当我们稍后想要将值“4”添加到x时,这很有用。