R:数据框中的错误

时间:2015-08-10 12:36:30

标签: r dataframe

我想在每个具有指定排名的州中返回包含医院的2列数据框。

这是一个输入:

rankall <- function(outcome, num = "best") {
data_frame <- data.frame()
data <- read.csv("outcome-of-care-measures.csv", colClasses = "character")
outcomes <- c("heart attack", "heart failure", "pneumonia")
if(!outcome %in% outcomes){stop("invalid outcome")}
df <- subset(data, select = c(Hospital.Name,State,Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack,
                            Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure,
                            Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia))

以下是我的代码的一部分:

for (i in states){
    if(outcome == "heart attack"){
        if(num =="best"){
            m <- df$Hospital.Name[which.min(df[ ,3])]
            data_frame <- rbind(data_frame,data.frame(m,i))
            }else if(num == "worst"){
                m <- df$Hospital.Name[which.max(df[ ,3])]
                data_frame <- rbind(data_frame,data.frame(m,i))
                }else{
                    df <- df[order(df[,3],df[["Hospital.Name"]],decreasing=FALSE,na.last=NA),]
                    m<- df[num,"Hospital.Name"]
                    data_frame <- rbind(data_frame,data.frame(m,i))

问题是,我的医院数据框为所有州提供了一家医院:这是我的代码:

 head(rankall("heart attack", 20), 10)
m  i
1  NORTHWESTERN MEMORIAL HOSPITAL AL
2  NORTHWESTERN MEMORIAL HOSPITAL AK
3  NORTHWESTERN MEMORIAL HOSPITAL AZ
4  NORTHWESTERN MEMORIAL HOSPITAL AR
5  NORTHWESTERN MEMORIAL HOSPITAL CA
6  NORTHWESTERN MEMORIAL HOSPITAL CO
7  NORTHWESTERN MEMORIAL HOSPITAL CT
8  NORTHWESTERN MEMORIAL HOSPITAL DE
9  NORTHWESTERN MEMORIAL HOSPITAL DC
10 NORTHWESTERN MEMORIAL HOSPITAL FL

您能否告诉我如何返回包含指定排名的每个州的医院的数据框?

数据

structure(list(Hospital.Name = c("SOUTHEAST ALABAMA MEDICAL CENTER", 
"MARSHALL MEDICAL CENTER SOUTH", "ELIZA COFFEE MEMORIAL HOSPITAL", 
"MIZELL MEMORIAL HOSPITAL", "CRENSHAW COMMUNITY HOSPITAL", "MARSHALL MEDICAL CENTER NORTH", 
"ST VINCENT'S EAST", "DEKALB REGIONAL MEDICAL CENTER", "SHELBY BAPTIST MEDICAL CENTER", 
"CALLAHAN EYE FOUNDATION HOSPITAL", "HELEN KELLER MEMORIAL HOSPITAL", 
"DALE MEDICAL CENTER", "CHEROKEE MEDICAL CENTER", "BAPTIST MEDICAL CENTER SOUTH", 
"JACKSON HOSPITAL & CLINIC INC"), State = c("AL", "AL", "AL", 
"AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
"AL"), Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack = c("14.3", 
"18.5", "18.1", "Not Available", "Not Available", "Not Available", 
"17.7", "18.0", "15.9", "Not Available", "19.6", "17.3", "Not Available", 
"17.8", "17.5"), Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure = c("11.4", 
"15.2", "11.3", "13.6", "13.8", "12.5", "10.9", "16.6", "13.6", 
"Not Available", "12.6", "11.8", "12.1", "11.8", "10.2"), Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia = c("10.9", 
"13.9", "13.4", "14.9", "15.8", "8.7", "16.2", "15.8", "10.7", 
"Not Available", "15.0", "9.9", "14.7", "14.3", "14.7")), .Names = c("Hospital.Name", 
"State", "Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack", 
"Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure", 
"Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia"), row.names = c(NA, 
15L), class = "data.frame")

1 个答案:

答案 0 :(得分:0)

rankall <- function(outcome, num = "best") {
  data_frame <- data.frame()
  data <- read.csv("outcome-of-care-measures.csv", colClasses = "character")
  outcomes <- c("heart attack", "heart failure", "pneumonia")
  if(!outcome %in% outcomes){stop("invalid outcome")}
  df <- subset(data, select = c(Hospital.Name,State,Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack,
                            Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure,
                            Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia))

  df[,3:5] <- suppressWarnings(lapply(df[,3:5], as.numeric))
  states <- unique(df$State)  
  for (i in states){
    if(outcome == "heart attack"){
        if(num =="best"){
            m <- df$Hospital.Name[which.min(df[,3][df$State == i])]
            data_frame <- rbind(data_frame,data.frame(m,i))
            }else if(num == "worst"){
                m <- df$Hospital.Name[which.max(df[,3][df$State == i])]
                data_frame <- rbind(data_frame,data.frame(m,i))
                }else{
                    df1 <- df[order(df[,3][df$State == i],df$Hospital.Name[df$State == i],decreasing=FALSE,na.last=NA),]
                    m<- df1[num,"Hospital.Name"]
                    data_frame <- rbind(data_frame,data.frame(m,i))}
    }
  }
  data_frame[order(data_frame$i),]
}

我对代码做了一些更改:

  1. 将测量列强制转移到班级numeric。使用行df[,3:5] <- suppressWarnings(lapply(df[,3:5], as.numeric))当您调用函数which.minwhich.max时,需要使用数字列。

  2. 使用行df[,3][df$State == i]添加了相关州的子集。您为每个州获得相同医院的原因是因为您没有在您的子集中包含该索引。

  3. 在末尾添加输出行data_frame[order(data_frame$i),]以按医院的字母顺序返回结束数据框。