我有一个非常大的数据集,之前没有使用过data.table。我发现语法有点难以理解。我的主要问题是如何重现“适用”和“应用”。数据表的功能?
我的数据如下
dat1 <- structure(list(id = c(1L, 1L, 2L, 3L), diag1 = structure(1:4, .Label = c("I20.1","I21.3", "I48", "I60.8"), class = "factor"), diag2 = structure(c(3L,2L, 1L, 1L), .Label = c("", "I50", "I60.9"), class = "factor"), diag3 = structure(c(1L, 2L, 1L, 1L), .Label = c("", "I38.1"), class = "factor")), .Names = c("id", "diag1", "diag2", "diag3"), row.names = c(NA, -4L), class = "data.frame")
我想为所有在I20,I21或I60的diag1,diag2或diag 3列中都有诊断代码的记录添加变量。使用apply和regex我已经完成了以下操作。
code.list <- c("I20","I21","I60")
dat1$index <- apply(dat1[2:4],1, function(i) any(grep(paste(code.list,
collapse="|"), i)))
我得到了我想要的最终数据集如下所示
structure(list(id = c(1L, 1L, 2L, 3L), diag1 = structure(1:4, .Label = c("I20.1","I21.3", "I48", "I60.8"), class = "factor"), diag2 = structure(c(3L,2L, 1L, 1L), .Label = c("", "I50", "I60.9"), class = "factor"),diag3 = structure(c(1L, 2L, 1L, 1L), .Label = c("", "I38.1"), class = "factor"), index = c(TRUE, TRUE, FALSE, TRUE)), .Names = c("id","diag1", "diag2", "diag3", "index"), row.names = c(NA, -4L), class = "data.frame")
然而,使用plyr需要太长时间。我希望得到数据表的语法。有人能帮忙吗?
提前致谢
A
答案 0 :(得分:0)
我们可以使用data.table
library(data.table)
setDT(dat1)[, index := Reduce(`|`, lapply(.SD, grepl,
pattern = paste(code.list, collapse="|"))), .SDcols = 2:4]
dat1
# id diag1 diag2 diag3 index
#1: 1 I20.1 I60.9 TRUE
#2: 1 I21.3 I50 I38.1 TRUE
#3: 2 I48 FALSE
#4: 3 I60.8 TRUE