我正在尝试解决关于courser的任务....这是找到美国最好的医院。 我在原始数据集的基础上制作了一个小数据集。 当我在小数据集上的函数时,它给出了正确的答案。但是当我在更大的数据集上运行该函数时,对于某些值我得到结果但是对于其他值我得到以下错误:
[1] 级别: 警告信息: 最好的(“TX”,“心脏病发作”):强制引入的NA
这是我的代码:
##THE best hospital problem
best <- function(state, outcome) {
setwd("C:/Users/Praveen/Documents/R/COURSERA/R-programming/Week 3/Programming_Assignment")
###Reading the dataset
x <- read.csv("outcome-of-care-measures.csv" , header =TRUE)
##vector of unique states in the data set
statevector <- unique(x$State)
## vector of outcomes
outcomevector <- c("heart attack" , "heart failure" , "pneumonia")
## checking validity of arguments
if(!(state %in% statevector)){
stop("Invalid State")
} else if(!(outcome %in% outcomevector)){
stop("Invalid Outcome")
} else {
message("OK") }
## Sub setting the data and getting relevant data set
X <- subset(x, x$State== state)
## if outcome is "heart attack", then calculate minimum value in 11th column in ##the data subset; Again sub setting the data on the basis of minimum value in ##11th column
if(outcome == outcomevector[1]){
y <- as.numeric(as.character(X[,11]))
z <- min(y, na.rm=TRUE)
z
subsetx <- subset(X, X[,11]==z)
answer <- subsetx[2]
answer
## if outcome is "heart failure", then calculate minimum value in 17th column ##in the data subset; Again sub setting the data on the basis of minimum value ##in 17th column
} else if (outcome == outcomevector[2]){
y <- as.numeric(as.character(X[,17]))
z <- min(y, na.rm=TRUE)
z
subsetx <- subset(X, X[,17]==z)
answer <- subsetx[2]
answer
## if outcome is "heart attack", then calculate minimum value in 23rd column in ##the data subset; Again sub setting the data on the basis of minimum value in ##23rd column
} else {
y <- as.numeric(as.character(X[,23]))
z <- min(y, na.rm=TRUE)
z
subsetx <- subset(X, X[,23]==z)
answer <- subsetx[2]
answer}
##if there are two or more equal minimum values, then sort alphabetically and ##select the hospital which comes first alphabetivcally
FA <- answer[with(answer, order(Hospital.Name)), ]
FFA <- FA[1]
FFANS <- droplevels(FFA)
FFANS
}
答案 0 :(得分:3)
代码存在多个问题,但直接的问题是你被因子bug所困扰。比较这些值:
class(z)
#[1] "numeric"
class(X[,11])
#[1] "factor"
因此,当您运行此命令subsetx <- subset(X, X[,11]==z)
时,即使存在匹配项,也不会得到匹配项。试试这个:
subset(X, as.numeric(X[,11])==z)
向量包含在函数as.numeric
中以提供此输出。
best("TX", "heart attack")
#OK
#[1] CYPRESS FAIRBANKS MEDICAL CENTER
#Levels: CYPRESS FAIRBANKS MEDICAL CENTER
#Warning message:
#In best("TX", "heart attack") : NAs introduced by coercion
您仍然会收到警告,因为您没有从一开始就消除因素。很难分辨从何处开始修复该方法,但它可能会帮助您完成任务。
<强>更新强>
您可以通过向read.csv
添加两个参数来开始,我们会将stringsAsFactors
设置为FALSE
,因此字符串仍然是字符。并na.strings
到Not Available
。这告诉R在文件中查找什么以确定缺失值。
x <- read.csv(file , header =TRUE, stringsAsFactors=F, na.strings="Not Available")
添加此更正步骤后,您现在可以取出所有as.numeric
和as.character
部分。看看我对心脏病发作部分做了些什么:
if("heart attack" == outcomevector[1]){
y <- X[,11]
z <- min(y, na.rm=TRUE)
subsetx <- subset(X, X[,11] %in% z)
answer <- subsetx[2]
answer
现在y
可以直接获取X[,11]
的值。并且subsetx
也不需要任何特殊处理。
在底部,您现在可以取出降低因子水平的最后一行。我把结局改为:
FA <- answer[with(answer, order(Hospital.Name)), ]
FA[1]
现在代码运行时,它没有警告:
best("TX", "heart attack")
#OK
#[1] "CYPRESS FAIRBANKS MEDICAL CENTER"
更新2
这是一个缩短的代码:
best2 <- function(state, outcome) {
setwd("C:/Users/Praveen/Documents/R/COURSERA/R-programming/Week 3/Programming_Assignment")
x <- read.csv("outcome-of-care-measures.csv" , header =TRUE, stringsAsFactors=F)
outcomevector <- c("heart attack" , "heart failure" , "pneumonia")
if(!(state %in% unique(x$State))) stop("Invalid State")
if(!(outcome %in% outcomevector)) stop("Invalid Outcome")
X <- x[x$State== state,]
names(X)[c(11, 17, 23)] <- outcomevector
answer <- X[X[,outcome] == min(X[,outcome]), ][2]
FA <- answer[with(answer, order(Hospital.Name)), ]
FA[1]
}
答案 1 :(得分:0)
确保您处理Not Available
条目(您的作业要求您这样做)
首先将所有Not Available
转换为NA
:
只需将na.strings = "Not Available"
添加到读取语句
x <- read.csv("outcome-of-care-measures.csv" , header =TRUE, na.strings = "Not Available")
然后使用complete.cases或na.omit
completedata <- (na.omit(x))
答案 2 :(得分:0)
我也在努力解决这个问题,但最终还是弄明白了。这是我的解决方案,评论为什么我使用了某些代码。我不确定这是否是最有效的方法,因为我是R的初学者,特别是当我保存一些数据帧变量时,如果你有机会这样做,请随意增强我的代码:
best <- function(state, outcome) {
## Read outcome data
outcome_read <- read.csv("outcome-of-care-measures.csv", colClasses = "character")
## subset out only the columns you need.
subsetOutcome <- outcome_read[, c(7, 11, 17, 23, 2)]
# convert to numerics as some of the items within the columns were read in as chars.
# you can see it if you run str(outcome_read)
subsetOutcome$Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure <- as.numeric(subsetOutcome$Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure)
subsetOutcome$Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia <- as.numeric(subsetOutcome$Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia)
subsetOutcome$Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack <- as.numeric(subsetOutcome$Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack)
## Check that state is valid by using the is.element function. It will check whether the element exist within your column.
if (!is.element(state, subsetOutcome[,1])) {
stop("invalid state")
}
# checking whether the outcome agrument is a valid health condition.
condition <- factor(c("heart attack", "heart failure", "pneumonia"))
if (!any(c("heart attack", "heart failure", "pneumonia") == outcome)) {
stop("invalid outcome")
}
# I determine the col number here by determining the level of the condition factor. For eg, if the outcome was
# heart failure, the level will be 2. This will be the col number.
colNumber <- as.numeric(condition[levels(condition) == outcome])
# self check - printing to see if the col number make sense.
print(colNumber)
## removing all the NAs within the outcome column (eg, heart failure column) you are looking at.
resultState <- subsetOutcome[complete.cases(subsetOutcome[,colNumber+1]),]
# subsetting your dataframe to only include the state you are looking at.
resultState <- subset(resultState, resultState$State == state)
# order your results based on the condition (eg, heart failure), then the hospital name.
result <- resultState[order(resultState[,colNumber+1], resultState[,5]), ]
# returning the first row and just the hospital.name column
result[1,"Hospital.Name"]
}