Question

我有一个data.table，包含不同数据类型的列。我的目标是只选择数字列并将这些列中的NA值替换为0。我知道用零替换na值是这样的：

DT[is.na(DT)] <- 0

要仅选择数字列，我找到了这个解决方案，它可以正常工作：

DT[, as.numeric(which(sapply(DT,is.numeric))), with = FALSE]

我可以通过分配

来实现我想要的

DT2 <- DT[, as.numeric(which(sapply(DT,is.numeric))), with = FALSE]

然后执行：

DT2[is.na(DT2)] <- 0

但我当然希望通过引用修改我的原始DT。但是，有以下内容：

DT[, as.numeric(which(sapply(DT,is.numeric))), with = FALSE]
                 [is.na(DT[, as.numeric(which(sapply(DT,is.numeric))), with = FALSE])]<- 0

我得到了

＆＃34; [.data.table中的错误（[...] i是无效类型（矩阵）＆＃34;

我错过了什么？非常感谢任何帮助!!

Answer 1

我们可以使用set

for(j in seq_along(DT)){
    set(DT, i = which(is.na(DT[[j]]) & is.numeric(DT[[j]])), j = j, value = 0)
 }

或者为数字列创建索引，循环遍历它并将set NA值设置为0

ind <-   which(sapply(DT, is.numeric))
for(j in ind){
    set(DT, i = which(is.na(DT[[j]])), j = j, value = 0)
}

数据

set.seed(24)
DT <- data.table(v1= c(NA, 1:4), v2 = c(NA, LETTERS[1:4]), v3=c(rnorm(4), NA))

Answer 2

我想探索并可能改善@akrun给出的出色答案。这是他在示例中使用的数据：

library(data.table)

set.seed(24)
DT <- data.table(v1= c(NA, 1:4), v2 = c(NA, LETTERS[1:4]), v3=c(rnorm(4), NA))
DT

#>    v1   v2         v3
#> 1: NA <NA> -0.5458808
#> 2:  1    A  0.5365853
#> 3:  2    B  0.4196231
#> 4:  3    C -0.5836272
#> 5:  4    D         NA

以及他建议使用的两种方法：

fun1 <- function(x){
  for(j in seq_along(x)){
  set(x, i = which(is.na(x[[j]]) & is.numeric(x[[j]])), j = j, value = 0)
  }
}

fun2 <- function(x){
  ind <-   which(sapply(x, is.numeric))
  for(j in ind){
    set(x, i = which(is.na(x[[j]])), j = j, value = 0)
  }
}

我认为上面的第一种方法确实是天才，因为它利用了NA键入的事实。

首先，即使.SD参数中没有i，也可以使用get()拉出列名，所以我认为我可以对{{ 1}}这样：

data.table

一般情况下，当然会依靠fun3 <- function(x){ nms <- names(x)[sapply(x, is.numeric)] for(j in nms){ x[is.na(get(j)), (j):=0] } }和.SD仅在数字列上工作

.SDcols

但是后来我心想：“嘿，谁说我们不能一直以R为基础进行此类操作。这是简单的fun4 <- function(x){ nms <- names(x)[sapply(x, is.numeric)] x[, (nms):=lapply(.SD, function(i) replace(i, is.na(i), 0)), .SDcols=nms] }，带有条件语句，并包装在lapply() < / p>

setDT()

最后，我们可以使用相同的条件表达式来限制应用fun5 <- function(x){ setDT( lapply(x, function(i){ if(is.numeric(i)) i[is.na(i)]<-0 i }) ) }

的列

set()

以下是基准：

fun6 <- function(x){
  for(j in seq_along(x)){
    if (is.numeric(x[[j]]) )
      set(x, i = which(is.na(x[[j]])), j = j, value = 0)
  }
}

Answer 3

您需要tidyverse purrr函数map_if和ifelse才能在一行代码中完成这项工作。

library(tidyverse)
set.seed(24)
DT <- data.table(v1= sample(c(1:3,NA),20,replace = T), v2 = sample(c(LETTERS[1:3],NA),20,replace = T), v3=sample(c(1:3,NA),20,replace = T))

下面的单行代码使用带有数字和非数字列的DT并仅对数字列进行操作以将NA替换为0：

DT %>% map_if(is.numeric,~ifelse(is.na(.x),0,.x)) %>% as.data.table

因此，tidyverse的详细程度可能不如data.table有时：-）

将NA替换为0，仅在data.table中的数字列中

3 个答案:

数据

下面的单行代码使用带有数字和非数字列的DT并仅对数字列进行操作以将NA替换为0：