我正在构建一个函数来替换像#34; - " R中NA
内data.table
的正确na_replacer <- function(data_set, characters_to_replace) {
text_features <- names(data_set)[sapply(data_set, class) %in% c("character","factor")]
for (x in text_features) {
data_set[, lapply(.SD, function(x) replace(x, which(x==any(characters_to_replace)), NA))]
}
return (data_set)
}
我的功能如下:
DT = data.table(ID = c("foo","bar","-","foo","[]","bah"), a = 1:6, b = 7:12, c = 13:18, d = c("aaa", "bbb", "ccc", "_", "eeee", "ffff"))
DT <- na_replacer(data_set = DT, characters_to_replace = c('-', '_', '[]'))
当我运行此函数时,我收到以下异常:
charToDate(x)出错:
字符串不在标准中 明确的格式
请帮助我让这个功能按预期工作,或者可能有一个较短的版本来执行我尝试执行的操作?
这是一个调用函数
的示例数据集 ID a b c d
1: foo 1 7 13 aaa
2: bar 2 8 14 bbb
3: - 3 9 15 ccc
4: foo 4 10 16 _
5: [] 5 11 17 eeee
6: bah 6 12 18 ffff
之前的数据集:
ID a b c d
1: foo 1 7 13 aaa
2: bar 2 8 14 bbb
3: NA 3 9 15 ccc
4: foo 4 10 16 NA
5: NA 5 11 17 eeee
6: bah 6 12 18 ffff
之后的预期数据集:
index Unnamed: 0 device_id gender age group phone_brand
---------- ---------- -------------------- ---------- ---------- ---------- -----------
1 0 -8076087639492063270 M 35 M32-38 小米
2 1 -2897161552818060146 M 35 M32-38 小米
3 2 -8260683887967679142 M 35 M32-38 小米
4 3 -4938849341048082022 M 30 M29-31 小米
5 4 245133531816851882 M 30 M29-31 小米
6 5 -1297074871525174196 F 24 F24-26 OPPO
7 6 236877999787307864 M 36 M32-38 酷派
8 7 -8098239495777311881 M 38 M32-38 小米
9 8 176515041953473526 M 33 M32-38 vivo
10 9 1596610250680140042 F 36 F33-42 三星
答案 0 :(得分:2)
请测试在data.table
上运行的此修改后的功能。
na_replacer <- function(data_set, characters_to_replace = c('-', '_')) {
library(data.table)
setDT(data_set)
text_features <- names(data_set)[sapply(data_set, class) %in% c("character", "factor")]
for (x in text_features) {
foo <- data_set[, get(x)]
data_set[, eval(x) := ifelse(foo %in% characters_to_replace, NA, foo)]
}
return(data_set)
}
答案 1 :(得分:1)
OP要求character
替换data.table的factor
或NA
类型的所有列中的某些字符串。
previously accepted answer因子列失败。
以下两种方法也适用于因子列:
library(data.table)
options(datatable.print.class = TRUE)
for (col in DT[, names(.SD)[lapply(.SD, class) %in% c("character", "factor")]]) {
DT[.(chr = c("-", "_", "[]")), on = paste0(col, "==chr"), (col) := NA_character_][]
}
DT
ID a b c d <char> <int> <int> <int> <fctr> 1: foo 1 7 13 aaa 2: bar 2 8 14 bbb 3: NA 3 9 15 ccc 4: foo 4 10 16 NA 5: NA 5 11 17 eeee 6: bah 6 12 18 ffff
set()
for (col in DT[, names(.SD)[lapply(.SD, class) %in% c("character", "factor")]]) {
set(DT, DT[get(col) %in% c("-", "_", "[]"), which = TRUE], col, NA_character_)
}
DT
ID a b c d <char> <int> <int> <int> <fctr> 1: foo 1 7 13 aaa 2: bar 2 8 14 bbb 3: NA 3 9 15 ccc 4: foo 4 10 16 NA 5: NA 5 11 17 eeee 6: bah 6 12 18 ffff
OP在最新更新中提供的示例数据集正在使用一个修改:列d
被强制转换为factor
:
DT <- data.table(ID = c("foo", "bar", "-", "foo", "[]", "bah"),
a = 1:6, b = 7:12, c = 13:18,
d = factor(c("aaa", "bbb", "ccc", "_", "eeee", "ffff")))
答案 2 :(得分:0)
像这样的工作
na_replacer <- function(data_set, characters_to_replace) {
text_features <- names(data_set)[sapply(data_set, class) %in% c("character","factor")]
for (x in text_features) {
data_set[[x]][grep(paste0('[',characters_to_replace,']',collapse =""),data_set[[x]])] <- NA
}
return (data_set)
}
答案 3 :(得分:0)
检查一下:
solution <- function(dt, replacer) {
result <- do.call(cbind, lapply(dt, function(x) lapply(x, function(x) { ifelse(is.na(x), replacer, x) } )))
as.data.frame(result)
}
# example:
dt <- data.frame(x = c(1, 4, NA, NA, 54), y = c(5, NA, -1, 0, 5))
cat("before:")
dt
cat("after:")
solution(dt, "-")
它将所有NA
值替换为data.frame中的给定符号。