如何在R中的字符数据中删除NA

时间:2019-11-13 10:09:51

标签: r dataframe character do.call

我想将每个月的最后两列复制到下个月的开始。我按如下所述进行操作,但是数据包含NA,当我将其更改为字符时,程序将崩溃。如何复制列以保持其类型?

我的代码:

library(readxl)
library(tibble)

df<- read_excel("C:/Users/Rezerwa/Documents/Database.xlsx")

df=add_column(df, Feb1 = as.character(do.call(paste0, df["January...4"])), .after = "January...5")
df=add_column(df, Feb2 = as.numeric(do.call(paste0, df["January...5"])), .after = "Feb1")

我的数据:

df
# A tibble: 10 x 13
   Product January...2 January...3 January...4 January...5 February...6 February...7 February...8 February...9 March...10 March...11 March...12 March...13
   <chr>   <lgl>       <lgl>       <chr>             <dbl> <chr>               <dbl> <chr>               <dbl> <chr>           <dbl> <chr>           <dbl>
 1 a       NA          NA          754.00                4 754.00                  4 754.00                  4 754.00              4 754.00              4
 2 b       NA          NA          706.00                3 706.00                  3 706.00                  3 706.00              3 706.00              3
 3 c       NA          NA          517.00                3 517.00                  3 517.00                  3 517.00              3 517.00              3
 4 d       NA          NA          1466.00               9 1466.00                 9 1466.00                 9 1466.00             9 1466.00             9
 5 e       NA          NA          543.00                8 543.00                  8 543.00                  8 543.00              8 543.00              8
 6 f       NA          NA          NA                   NA NA                     NA NA                     NA NA                 NA NA                 NA
 7 g       NA          NA          NA                   NA NA                     NA NA                     NA NA                 NA NA                 NA
 8 h       NA          NA          NA                   NA NA                     NA NA                     NA NA                 NA NA                 NA
 9 i       NA          NA          1466.00               8 NA                     NA NA                     NA NA                 NA NA                 NA
10 j       NA          NA          NA                   NA 543.00                  3 NA                     NA NA                 NA NA                 NA

我的错误:

> df=add_column(df, Feb1 = as.character(do.call(paste0, df["January...4"])), .after = "January...5")
> df=add_column(df, Feb2 = as.numeric(do.call(paste0, df["January...5"])), .after = "Feb1")
Warning message:
In eval_tidy(xs[[i]], unique_output) : NAs introduced by coercion

enter image description here

1 个答案:

答案 0 :(得分:1)

使用基数R,我们可以根据列名的前缀对列进行拆分,从每个组中选择最后两列,并从cbind到原始df

df1 <- cbind(df, do.call(cbind, lapply(split.default(df[-1], 
      sub("\\..*", "", names(df)[-1])), function(x) {n <- ncol(x);x[, c(n-1, n)]})))

要按顺序获取数据,我们可以

cbind(df1[1], df1[-1][order(match(sub("\\..*", "", names(df1)[-1]), month.name))])

数据

df <- structure(list(Product = structure(1:10, .Label = c("a", "b", 
"c", "d", "e", "f", "g", "h", "i", "j"), class = "factor"), January...2 = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA), January...3 = c(NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA), January...4 = c(754, 706, 517, 
1466, 543, NA, NA, NA, 1466, NA), January...5 = c(4L, 3L, 3L, 
9L, 8L, NA, NA, NA, 8L, NA), February...6 = c(754, 706, 517, 
1466, 543, NA, NA, NA, NA, 543), February...7 = c(4L, 3L, 3L, 
9L, 8L, NA, NA, NA, NA, 3L), February...8 = c(754, 706, 517, 
1466, 543, NA, NA, NA, NA, NA), February...9 = c(4L, 3L, 3L, 
9L, 8L, NA, NA, NA, NA, NA), March...10 = c(754, 706, 517, 1466, 
543, NA, NA, NA, NA, NA), March...11 = c(4L, 3L, 3L, 9L, 8L, 
NA, NA, NA, NA, NA), March...12 = c(754, 706, 517, 1466, 543, 
NA, NA, NA, NA, NA), March...13 = c(4L, 3L, 3L, 9L, 8L, NA, NA, 
NA, NA, NA)), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10"))