我的数据框如下:
hospital <- c("PROVIDENCE ALASKA MEDICAL CENTER", "ALASKA REGIONAL HOSPITAL", "FAIRBANKS MEMORIAL HOSPITAL",
"CRESTWOOD MEDICAL CENTER", "BAPTIST MEDICAL CENTER EAST", "ARKANSAS HEART HOSPITAL",
"MEDICAL CENTER NORTH LITTLE ROCK", "CRITTENDEN MEMORIAL HOSPITAL")
state <- c("AK", "AK", "AK", "AL", "AL", "AR", "AR", "AR")
rank <- c(1,2,3,1,2,1,2,3)
df <- data.frame(hospital, state, rank)
df
hospital state rank
1 PROVIDENCE ALASKA MEDICAL CENTER AK 1
2 ALASKA REGIONAL HOSPITAL AK 2
3 FAIRBANKS MEMORIAL HOSPITAL AK 3
4 CRESTWOOD MEDICAL CENTER AL 1
5 BAPTIST MEDICAL CENTER EAST AL 2
6 ARKANSAS HEART HOSPITAL AR 1
7 MEDICAL CENTER NORTH LITTLE ROCK AR 2
8 CRITTENDEN MEMORIAL HOSPITAL AR 3
我想创建一个函数rankall,它将rank作为参数并返回每个州的该级别的医院,如果州没有匹配给定级别的医院,则返回NA。例如,我想要rankall(rank = 3)的输出看起来像这样:
hospital state
AK FAIRBANKS MEMORIAL HOSPITAL AK
AL <NA> AL
AR CRITTENDEN MEMORIAL HOSPITAL AR
我试过了:
rankall <- function(rank) {
split_by_state <- split(df, df$state)
ranked_hospitals <- lapply(split_by_state, function (x) {
x[(x$rank==rank), ]
})
combined_ranked_hospitals <- do.call(rbind, ranked_hospitals)
return(combined_ranked_hospitals[ ,1:2])
}
但是rankall(rank = 3)返回:
hospital state
AK FAIRBANKS MEMORIAL HOSPITAL AK
AR CRITTENDEN MEMORIAL HOSPITAL AR
这省去了我需要跟踪的NA值。有没有办法让R在我的函数中识别列表对象中的空行作为NA,而不是空行?除了lapply还有其他功能对这项任务更有用吗?
[注意:此数据框来自Coursera R Programming课程。这也是我在Stackoverflow上的第一篇文章,也是我第一次学习编程。感谢所有提供解决方案和建议的人,这个论坛太棒了。 ]
答案 0 :(得分:1)
你的功能中只需要一个in / else:
rankall <- function(rank) {
split_by_state <- split(df, df$state)
ranked_hospitals <- lapply(split_by_state, function (x) {
indx <- x$rank==rank
if(any(indx)){
return(x[indx, ])
else{
out = x[1, ]
out$hospital = NA
return(out)
}
}
}
答案 1 :(得分:1)
这是另一种方法:
rankall <- function(rank) {
do.call(rbind, lapply(split(df, df$state), function(df) {
tmp <- df[df$rank == rank, 1:2]
if (!nrow(tmp)) return(transform(df[1, 1:2], hospital = NA)) else return(tmp)
}))
}
rankall(3)
# hospital state
# AK FAIRBANKS MEMORIAL HOSPITAL AK
# AL <NA> AL
# AR CRITTENDEN MEMORIAL HOSPITAL AR
答案 2 :(得分:1)
这是另一种dplyr
方法。
fun1 <- function(x) {
group_by(df, state) %>%
summarise(hospital = hospital[x],
rank = nth(rank, x))
}
# fun1(3)
#Source: local data frame [3 x 3]
#
# state hospital rank
#1 AK FAIRBANKS MEMORIAL HOSPITAL 3
#2 AL NA NA
#3 AR CRITTENDEN MEMORIAL HOSPITAL 3
答案 3 :(得分:0)
我认为这是对dplyr
的好用。当我使用NA
代替"NA"
时,只有奇怪的是总结抱怨。有人想过为什么?
library(dplyr)
rankall <- function(chosen_rank){
group_by(df, state) %>%
summarize(hospital = ifelse(length(hospital[rank==chosen_rank])!=0,
as.character(hospital[rank==chosen_rank]), "NA"),
rank = chosen_rank)
}
rankall(1)
rankall(2)
rankall(3)