大文件但没有示例的data.table grep错误

时间:2019-03-04 22:10:30

标签: r grep data.table

我想在data.table中的一组指定列中搜索给定的字符串,并在找到的行中设置不同列的值。

以下是与此小文件配合使用的基本结构。

dt <- structure(list(Abstract = c("RCP", "RCP8.5", "Another string"
), Author.Keywords = c("Random key words", "", "Crop system; Environmental sustainability"), RCP = c("None", "None", "None")), class = c("data.table", 
 "data.frame"), row.names = c(NA, -3L))

在数据表中,grep在“摘要”和“作者。关键字”列中查找“ RCP”,并在找到RCP时将“ RCP”写入RCP列。

dt[grep("RCP", c(Abstract, Author.Keywords), perl = TRUE, ignore.case = TRUE), RCP := "RCP"]

但是我有一个名为'livestock的数据表,其中有1,632行和34列。这是我尝试运行相同代码时收到的消息。

livestock[grep("RCP", c(Abstract, Author.Keywords), perl = TRUE, ignore.case = TRUE), RCP := "RCP"]

Error in `[.data.table`(livestock, grep("RCP", c(Abstract, Author.Keywords),  : 
  i[16] is 1825 which is out of range [1,nrow=1632]

看来我的grep代码搜索的范围超出了data.table的末尾,但是为什么呢?以及如何解决?

使用grepl代替grep返回

Error in `[.data.table`(livestock, grepl("RCP", c(Abstract, Author.Keywords),  : 
  i evaluates to a logical vector length 3264 but there are 1632 rows. Recycling of logical i is no longer allowed as it hides more bugs than is worth the rare convenience. Explicitly use rep(...,length=.N) if you really need to recycle.

1 个答案:

答案 0 :(得分:2)

我们在.SDcols中指定感兴趣的列,用.SD遍历数据表的子集(lapply),用grepl检查字符串“ RCP”以返回list的逻辑向量,即Reduce d到具有vector|)的单个逻辑or

i1 <- livestock[, Reduce("|", lapply(.SD, function(x) 
     grepl("RCP", x))), .SDcols = c("Abstract", "Author.Keywords")]

如果子字符串“ RCP”必须位于.SDcols中指定的所有列中,则使用&代替|中的Reduce

i1 <- livestock[, Reduce("&", lapply(.SD, function(x) 
     grepl("RCP", x))), .SDcols = c("Abstract", "Author.Keywords")]

使用i中的逻辑向量对行进行子集并将“ RCP”分配给RCP

livestock[i1, RCP := "RCP"]