列表中的空行为R中data.frame中的NA值

时间:2015-02-27 20:04:05

标签: r list lapply na rbind

我的数据框如下:

hospital <- c("PROVIDENCE ALASKA MEDICAL CENTER", "ALASKA REGIONAL HOSPITAL", "FAIRBANKS MEMORIAL HOSPITAL", 
          "CRESTWOOD MEDICAL CENTER", "BAPTIST MEDICAL CENTER EAST", "ARKANSAS HEART HOSPITAL", 
          "MEDICAL CENTER NORTH LITTLE ROCK", "CRITTENDEN MEMORIAL HOSPITAL")
state <- c("AK", "AK", "AK", "AL", "AL", "AR", "AR", "AR")
rank <- c(1,2,3,1,2,1,2,3)
df <- data.frame(hospital, state, rank)
df

                                 hospital    state     rank
    1   PROVIDENCE ALASKA MEDICAL CENTER        AK        1
    2   ALASKA REGIONAL HOSPITAL                AK        2
    3   FAIRBANKS MEMORIAL HOSPITAL             AK        3
    4   CRESTWOOD MEDICAL CENTER                AL        1
    5   BAPTIST MEDICAL CENTER EAST             AL        2
    6   ARKANSAS HEART HOSPITAL                 AR        1
    7   MEDICAL CENTER NORTH LITTLE ROCK        AR        2
    8   CRITTENDEN MEMORIAL HOSPITAL            AR        3

我想创建一个函数rankall,它将rank作为参数并返回每个州的该级别的医院,如果州没有匹配给定级别的医院,则返回NA。例如,我想要rankall(rank = 3)的输出看起来像这样:

                           hospital     state 
    AK  FAIRBANKS MEMORIAL HOSPITAL        AK    
    AL                         <NA>        AL
    AR CRITTENDEN MEMORIAL HOSPITAL        AR    

我试过了:

rankall <- function(rank) {
split_by_state <- split(df, df$state)
ranked_hospitals <- lapply(split_by_state, function (x) {
    x[(x$rank==rank), ]
})
combined_ranked_hospitals <- do.call(rbind, ranked_hospitals)
return(combined_ranked_hospitals[ ,1:2])
}

但是rankall(rank = 3)返回:

                                 hospital     state     
    AK       FAIRBANKS MEMORIAL HOSPITAL         AK                        
    AR       CRITTENDEN MEMORIAL HOSPITAL        AR             

这省去了我需要跟踪的NA值。有没有办法让R在我的函数中识别列表对象中的空行作为NA,而不是空行?除了lapply还有其他功能对这项任务更有用吗?

[注意:此数据框来自Coursera R Programming课程。这也是我在Stackoverflow上的第一篇文章,也是我第一次学习编程。感谢所有提供解决方案和建议的人,这个论坛太棒了。 ]

4 个答案:

答案 0 :(得分:1)

你的功能中只需要一个in / else:

rankall <- function(rank) {
    split_by_state <- split(df, df$state)
    ranked_hospitals <- lapply(split_by_state, function (x) {
        indx <- x$rank==rank
        if(any(indx)){
            return(x[indx, ])
        else{
            out = x[1, ]
            out$hospital = NA
            return(out)
        }
    }
}

答案 1 :(得分:1)

这是另一种方法:

rankall <- function(rank) {  
  do.call(rbind, lapply(split(df, df$state), function(df) { 
    tmp <- df[df$rank == rank, 1:2]   
    if (!nrow(tmp)) return(transform(df[1, 1:2], hospital = NA)) else return(tmp) 
  })) 
}
rankall(3)
#   hospital state
#   AK  FAIRBANKS MEMORIAL HOSPITAL    AK
#   AL                         <NA>    AL
#   AR CRITTENDEN MEMORIAL HOSPITAL    AR

答案 2 :(得分:1)

这是另一种dplyr方法。

fun1 <- function(x) {
            group_by(df, state) %>%
            summarise(hospital = hospital[x],
                      rank = nth(rank, x))
        }

# fun1(3)
#Source: local data frame [3 x 3]
#
#  state                     hospital rank
#1    AK  FAIRBANKS MEMORIAL HOSPITAL    3
#2    AL                           NA   NA
#3    AR CRITTENDEN MEMORIAL HOSPITAL    3

答案 3 :(得分:0)

我认为这是对dplyr的好用。当我使用NA代替"NA"时,只有奇怪的是总结抱怨。有人想过为什么?

library(dplyr)
rankall <- function(chosen_rank){
  group_by(df, state) %>%
    summarize(hospital = ifelse(length(hospital[rank==chosen_rank])!=0,
                                as.character(hospital[rank==chosen_rank]), "NA"),
              rank = chosen_rank)
}

rankall(1)
rankall(2)
rankall(3)