Question

我正在读取csv文件中的数据，该文件有3列（医院名称 - 字符，州字符，死亡率 - 数字）：

datafile <- read.csv("outcome-of-care-measures.csv", 
    na.strings = "Not Available",
    colClasses = c("character","character","numeric"))

现在我根据状态分割数据：

## split data based on state name
data_split <- split(datafile,datafile$State)

我的问题是找到每个州的“最差”医院（最高死亡率）并显示结果。为此，首先我对数据进行了排序“:(速率是一个列表）

for (i in 1:length(data_split)){
  ## remove all rows with NA
  rate[[i]] <- data_split[[i]][complete.cases(data_split[[i]][ ,3]), ]  
  ##sort by mortality and remove
  ## conflict by hospital name
  rate[[i]] <- rate[[i]][order(rate[[i]][, 3],rate[[i]][ ,1]), ]  

}

计划正在运作但我得到的医院名称错误很多州。我无法在程序中找到错误。

Answer 1

为什么要拆分data.frame？

这样的事情有帮助吗？

df <- data.frame('hospital' = LETTERS[1:6],
                 'state' = rep(c('state1', 'state2', 'state3'),2),
                 'mr' = c(1:6))
df
  hospital  state mr
1        A state1  1
2        B state2  2
3        C state3  3
4        D state1  4
5        E state2  5
6        F state3  6

df2 <- df[with(df, order(-mr, state)), ]

df2[!duplicated(df2$state), ]
  hospital  state mr
6        F state3  6
5        E state2  5
4        D state1  4

您可以使用您的方法执行此操作，但使用所有初始完整条目维护列表。但为什么呢？

ds <- split(df, df$state) 
rate <- list()
for (i in 1:length(ds)){
  ## remove all rows with NA
  rate[[i]] <- ds[[i]][complete.cases(ds[[i]][ ,3]), ]  
  ##sort by mortality and remove
  ## conflict by hospital name
  rate[[i]] <- rate[[i]][order(- rate[[i]][, 3], rate[[i]][ ,1]), ]  

}
rate

 [[1]]
  hospital  state mr
4        D state1  4
1        A state1  1

[[2]]
  hospital  state mr
5        E state2  5
2        B state2  2

[[3]]
  hospital  state mr
6        F state3  6
3        C state3  3

无法在我的R程序中找到错误

1 个答案: