使用NA删除行和列,但保留R中的值

时间:2019-05-15 15:21:06

标签: r dplyr na

我有一个6列10650行的数据框。在这些文件上,有值或NA,如以下示例所示:

Date         X10   X20   X30   X40    X50    X60
2012-01-01   0.5   0.6   NA    NA     NA     NA
2012-01-02   0.3   0.2   NA    NA     NA     NA
2012-01-03   0.5   0.6   NA    NA     NA     NA
2012-01-04   0.3   0.2   NA    NA     NA     NA
2012-01-05   NA    0.6   0.4   NA     NA     NA
2012-01-06   NA    0.2   1.2   NA     NA     NA
2012-01-07   NA    0.6   1.6   NA     NA     NA
2012-01-08   NA    NA    1.8   2.4    NA     NA
2012-01-09   NA    NA    2.1   3.2    NA     NA
2012-01-10   NA    NA    2.6   3.3    NA     NA
2012-01-11   NA    NA    NA    3.7    5.1    NA
2012-01-12   NA    NA    NA    3.9    5.7    NA
2012-01-13   NA    NA    NA    4.2    5.6    NA
2012-01-14   NA    NA    NA    NA     6.5    2.2
2012-01-15   NA    NA    NA    NA     6.9    2.9
2012-01-16   NA    NA    NA    NA     7.2    4.2

现在,我只想删除NA并创建4列,如下所示:

Date         X1    X2    Xmin   
2012-01-01   0.5   0.6   10   
2012-01-02   0.3   0.2   10    
2012-01-03   0.5   0.6   10
2012-01-04   0.3   0.2   10
2012-01-05   0.6   0.4   20
2012-01-06   0.2   1.2   20
2012-01-07   0.6   1.6   20
2012-01-08   1.8   2.4   30
2012-01-09   2.1   3.2   30 
2012-01-10   2.6   3.3   30 
2012-01-11   3.7   5.1   40 
2012-01-12   3.9   5.7   40  
2012-01-13   4.2   5.6   40
2012-01-14   6.5   2.2   50
2012-01-15   6.9   2.9   50
2012-01-16   7.2   4.2   50

我尝试使用stackoverflow中建议的帮助

> final[complete.cases(final), ]

> final <- na.omit(final)

他们都没有工作。

2 个答案:

答案 0 :(得分:3)

我们可以使用apply。用applyMARGIN = 1)循环遍历数据集的子集行(没有'Date'列),然后删除NA元素(na.omit),创建data.frame和这些元素,并且'Xmin'是第一个非NA元素的列名称,rbind元素,cbind是第一列

cbind(df1[1], do.call(rbind, apply(df1[-1], 1, 
  function(x) data.frame(setNames(as.list(na.omit(x)),
      c("X1", "X2")), Xmin = sub("^X", "", names(na.omit(x)[1]))))))

数据

df1 <- structure(list(Date = c("2012-01-01", "2012-01-02", "2012-01-03", 
"2012-01-04", "2012-01-05", "2012-01-06", "2012-01-07", "2012-01-08", 
"2012-01-09", "2012-01-10", "2012-01-11", "2012-01-12", "2012-01-13", 
"2012-01-14", "2012-01-15", "2012-01-16"), X10 = c(0.5, 0.3, 
0.5, 0.3, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), X20 = c(0.6, 
0.2, 0.6, 0.2, 0.6, 0.2, 0.6, NA, NA, NA, NA, NA, NA, NA, NA, 
NA), X30 = c(NA, NA, NA, NA, 0.4, 1.2, 1.6, 1.8, 2.1, 2.6, NA, 
NA, NA, NA, NA, NA), X40 = c(NA, NA, NA, NA, NA, NA, NA, 2.4, 
3.2, 3.3, 3.7, 3.9, 4.2, NA, NA, NA), X50 = c(NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, 5.1, 5.7, 5.6, 6.5, 6.9, 7.2), X60 = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2.2, 2.9, 4.2
)), class = "data.frame", row.names = c(NA, -16L))

答案 1 :(得分:1)

这是(基于)tidyverse的解决方案。它做的事情与akrun的答案相似,并且在性能方面也相当。唯一的区别可能是可读性,但这可能是一个偏好问题:

library(dplyr)
library(purrr)

df[2:ncol(df)] %>% 
    split(df$Date) %>% 
    map_dfr(function(x) {
        cl <- na.omit(t(x))
        Xmin <- rownames(cl)[1] %>% substr(., 2, nchar(.)) %>% as.numeric()
        tibble(X1 = cl[1,], X2 = cl[2,], Xmin = Xmin)
    }
    ) %>% 
    bind_cols(df["Date"], .)

########### OUTPUT ############

# A tibble: 16 x 4
   Date          X1    X2  Xmin
   <date>     <dbl> <dbl> <dbl>
 1 2012-01-01   0.5   0.6    10
 2 2012-01-02   0.3   0.2    10
 3 2012-01-03   0.5   0.6    10
 4 2012-01-04   0.3   0.2    10
 5 2012-01-05   0.6   0.4    20
 6 2012-01-06   0.2   1.2    20
 7 2012-01-07   0.6   1.6    20
 8 2012-01-08   1.8   2.4    30
 9 2012-01-09   2.1   3.2    30
10 2012-01-10   2.6   3.3    30
11 2012-01-11   3.7   5.1    40
12 2012-01-12   3.9   5.7    40
13 2012-01-13   4.2   5.6    40
14 2012-01-14   6.5   2.2    50
15 2012-01-15   6.9   2.9    50
16 2012-01-16   7.2   4.2    50