从数据框列中删除NA。移除后不同长度的列

时间:2015-10-28 11:42:26

标签: r

这就是我的数据的样子:

structure(list(`Name1` = c("Mark", 
                                                           NA, NA, NA, NA, NA), Name2 = c(NA, "Stefan", 
                                                                                                                   "Clara", NA, NA, NA), `Name3` = c(NA, NA, 
                                                                                                                                                                           NA, "Max", "Pete", "Gabe"), `Name4` = c("Titan", 
                                                                                                                                                                                                                                                     NA_character_, NA_character_, NA_character_, NA_character_, NA_character_
                                                                                                                                                                           ), `Name5` = c(NA_character_, NA_character_, 
                                                                                                                                                                                                              NA_character_, NA_character_, "Tom", NA_character_), 
               Name6 = c(NA_character_, "Narq", NA_character_, 
                                        NA_character_, "Seba", NA_character_), Name7 = c(NA_character_, 
                                                                                                                      NA_character_, "Greg", NA_character_, NA_character_, 
                                                                                                                      NA_character_), Name8 = c(NA_character_, 
                                                                                                                                                                            NA_character_, NA_character_, "Terry", NA_character_, 
                                                                                                                                                                            NA_character_), Name9 = c(NA_character_, 
                                                                                                                                                                                                                                NA_character_, NA_character_, NA_character_, "Coaty", 
                                                                                                                                                                                                                                NA_character_), Name10 = c(NA_character_, 
                                                                                                                                                                                                                                                                                                           NA_character_, "Meg", NA_character_, NA_character_, 
                                                                                                                                                                                                                                                                                                           NA_character_)), .Names = c("Name1", 
                                                                                                                                                                                                                                                                                                                                       "Name2", "Name3", 
                                                                                                                                                                                                                                                                                                                                       "Name4", "Name5", "Name6", 
                                                                                                                                                                                                                                                                                                                                       "Name7", "Name8", 
                                                                                                                                                                                                                                                                                                                                       "Name9", "Name10"
                                                                                                                                                                                                                                                                                                           ), row.names = c("1", "2", "3", "4", "5", "6"), class = "data.frame")

所以我想从这个数据帧中删除所有NA,即使它会创建一个具有不同列长度的数据帧。

期望的输出:

  Name1  Name2 Name3 Name4 Name5 Name6 Name7 Name8 Name9 Name10
1  Mark Stefan   Max Titan   Tom  Narq  Greg Terry Coaty    Meg
2        Clara  Pete              Seba         
3               Gabe            

2 个答案:

答案 0 :(得分:6)

实际上,您可以通过编程方式实现与所需输出类似的功能。虽然我认为NA""更好,因为它们适用于任何类,并且易于操作/操作

首先,我们可以定义一个能够处理这个

的函数
RemoveNAs <- function(x, size) {
  temp <- x[!is.na(x)]
  c(temp, rep("", size - length(temp)))
}

然后,计算数据中最长的非NA列大小

Max <- max(colSums(!is.na(df)))

然后,使用data.table我只会做

library(data.table)
setDT(df)[, lapply(.SD, RemoveNAs, Max)]
#    Name1  Name2 Name3 Name4 Name5 Name6 Name7 Name8 Name9 Name10
# 1:  Mark Stefan   Max Titan   Tom  Narq  Greg Terry Coaty    Meg
# 2:        Clara  Pete              Seba                         
# 3:               Gabe                                           

我认为这是你想要实现的目标,但正如我所说,在结果IMO中NA而不是""更好。

答案 1 :(得分:-1)

is.na(data)会给你一个布尔列表。此列表将为TRUE,列表中的条目为<NA>。如果您在TRUE数据中使用data[boolean_list] = value条目,则会将其替换为value

enter image description here

enter image description here] 1