我正在读取csv文件中的数据,该文件有3列(医院名称 - 字符,州字符,死亡率 - 数字):
datafile <- read.csv("outcome-of-care-measures.csv",
na.strings = "Not Available",
colClasses = c("character","character","numeric"))
现在我根据状态分割数据:
## split data based on state name
data_split <- split(datafile,datafile$State)
我的问题是找到每个州的“最差”医院(最高死亡率)并显示结果。为此,首先我对数据进行了排序“:(速率是一个列表)
for (i in 1:length(data_split)){
## remove all rows with NA
rate[[i]] <- data_split[[i]][complete.cases(data_split[[i]][ ,3]), ]
##sort by mortality and remove
## conflict by hospital name
rate[[i]] <- rate[[i]][order(rate[[i]][, 3],rate[[i]][ ,1]), ]
}
计划正在运作但我得到的医院名称错误很多州。我无法在程序中找到错误。
答案 0 :(得分:1)
为什么要拆分data.frame?
这样的事情有帮助吗?df <- data.frame('hospital' = LETTERS[1:6],
'state' = rep(c('state1', 'state2', 'state3'),2),
'mr' = c(1:6))
df
hospital state mr
1 A state1 1
2 B state2 2
3 C state3 3
4 D state1 4
5 E state2 5
6 F state3 6
df2 <- df[with(df, order(-mr, state)), ]
df2[!duplicated(df2$state), ]
hospital state mr
6 F state3 6
5 E state2 5
4 D state1 4
您可以使用您的方法执行此操作,但使用所有初始完整条目维护列表。但为什么呢?
ds <- split(df, df$state)
rate <- list()
for (i in 1:length(ds)){
## remove all rows with NA
rate[[i]] <- ds[[i]][complete.cases(ds[[i]][ ,3]), ]
##sort by mortality and remove
## conflict by hospital name
rate[[i]] <- rate[[i]][order(- rate[[i]][, 3], rate[[i]][ ,1]), ]
}
rate
[[1]]
hospital state mr
4 D state1 4
1 A state1 1
[[2]]
hospital state mr
5 E state2 5
2 B state2 2
[[3]]
hospital state mr
6 F state3 6
3 C state3 3