用NA data.table

时间:2017-11-06 13:00:49

标签: r data.table na

我正在构建一个函数来替换像#34; - &#34; R中NAdata.table的正确na_replacer <- function(data_set, characters_to_replace) { text_features <- names(data_set)[sapply(data_set, class) %in% c("character","factor")] for (x in text_features) { data_set[, lapply(.SD, function(x) replace(x, which(x==any(characters_to_replace)), NA))] } return (data_set) }

我的功能如下:

DT = data.table(ID = c("foo","bar","-","foo","[]","bah"), a = 1:6, b = 7:12, c = 13:18, d = c("aaa", "bbb", "ccc", "_", "eeee", "ffff"))
DT <- na_replacer(data_set = DT, characters_to_replace = c('-', '_', '[]'))

当我运行此函数时,我收到以下异常:

  

charToDate(x)出错:
  字符串不在标准中   明确的格式

请帮助我让这个功能按预期工作,或者可能有一个较短的版本来执行我尝试执行的操作?

这是一个调用函数

的示例数据集
    ID a  b  c    d
1: foo 1  7 13  aaa
2: bar 2  8 14  bbb
3:   - 3  9 15  ccc
4: foo 4 10 16    _
5:  [] 5 11 17 eeee
6: bah 6 12 18 ffff

之前的数据集:

    ID a  b  c    d
1: foo 1  7 13  aaa
2: bar 2  8 14  bbb
3:  NA 3  9 15  ccc
4: foo 4 10 16   NA
5:  NA 5 11 17 eeee
6: bah 6 12 18 ffff

之后的预期数据集:

index       Unnamed: 0  device_id             gender      age         group       phone_brand
    ----------  ----------  --------------------  ----------  ----------  ----------  -----------
    1           0           -8076087639492063270  M           35          M32-38      小米         
    2           1           -2897161552818060146  M           35          M32-38      小米         
    3           2           -8260683887967679142  M           35          M32-38      小米         
    4           3           -4938849341048082022  M           30          M29-31      小米         
    5           4           245133531816851882    M           30          M29-31      小米         
    6           5           -1297074871525174196  F           24          F24-26      OPPO       
    7           6           236877999787307864    M           36          M32-38      酷派         
    8           7           -8098239495777311881  M           38          M32-38      小米         
    9           8           176515041953473526    M           33          M32-38      vivo       
    10          9           1596610250680140042   F           36          F33-42      三星   

4 个答案:

答案 0 :(得分:2)

请测试在data.table上运行的此修改后的功能。

na_replacer <- function(data_set, characters_to_replace = c('-', '_')) {
    library(data.table)
    setDT(data_set)
    text_features <- names(data_set)[sapply(data_set, class) %in% c("character", "factor")]
    for (x in text_features) {
        foo <- data_set[, get(x)]
        data_set[, eval(x) := ifelse(foo %in% characters_to_replace, NA, foo)]
    }
    return(data_set)
}

答案 1 :(得分:1)

OP要求character替换data.table的factorNA类型的所有列中的某些字符串。

previously accepted answer因子列失败。

以下两种方法也适用于因子列:

加入时更新

library(data.table)
options(datatable.print.class = TRUE)

for (col in DT[, names(.SD)[lapply(.SD, class) %in% c("character", "factor")]]) {
  DT[.(chr = c("-", "_", "[]")), on = paste0(col, "==chr"), (col) := NA_character_][]
}
DT
       ID     a     b     c      d
   <char> <int> <int> <int> <fctr>
1:    foo     1     7    13    aaa
2:    bar     2     8    14    bbb
3:     NA     3     9    15    ccc
4:    foo     4    10    16     NA
5:     NA     5    11    17   eeee
6:    bah     6    12    18   ffff

使用set()

for (col in DT[, names(.SD)[lapply(.SD, class) %in% c("character", "factor")]]) {
  set(DT, DT[get(col) %in% c("-", "_", "[]"), which = TRUE], col, NA_character_)
}
DT
       ID     a     b     c      d
   <char> <int> <int> <int> <fctr>
1:    foo     1     7    13    aaa
2:    bar     2     8    14    bbb
3:     NA     3     9    15    ccc
4:    foo     4    10    16     NA
5:     NA     5    11    17   eeee
6:    bah     6    12    18   ffff

数据

OP在最新更新中提供的示例数据集正在使用一个修改:列d被强制转换为factor

DT <- data.table(ID = c("foo", "bar", "-", "foo", "[]", "bah"), 
                 a = 1:6, b = 7:12, c = 13:18, 
                 d = factor(c("aaa", "bbb", "ccc", "_", "eeee", "ffff")))

答案 2 :(得分:0)

像这样的工作

na_replacer <- function(data_set, characters_to_replace) {
  text_features <- names(data_set)[sapply(data_set, class) %in% c("character","factor")]
  for (x in text_features) {
    data_set[[x]][grep(paste0('[',characters_to_replace,']',collapse  =""),data_set[[x]])] <- NA
  }
  return (data_set)
}

答案 3 :(得分:0)

检查一下:

solution <- function(dt, replacer) {
  result <- do.call(cbind, lapply(dt, function(x) lapply(x, function(x) {  ifelse(is.na(x), replacer, x) } )))
  as.data.frame(result)
}

# example:
dt <- data.frame(x = c(1, 4, NA, NA, 54), y = c(5, NA, -1, 0, 5))
cat("before:")
dt
cat("after:")
solution(dt, "-")

它将所有NA值替换为data.frame中的给定符号。