Question

我有以下样本格式的data.table。

dt <- data.table(l = c("apple","ball","cat"),
                 m = c(1,2,3),
                 n = c("I ate apple", "I played ball", "cat ate pudding"))

我想将sub应用于每一行的列（n），其中模式来自另一列（l）。我该怎么做？

我正在寻找的输出是，

              l m             n    o
       1: apple 1     I ate apple       I ate
       2:  ball 2   I played ball    I played
       3:   cat 3 cat ate pudding ate pudding

我尝试在data.table中使用方法mapply(do.call, list(sub), ...)和赋值运算符，但sub的参数（pattern，replacement，string）需要是{{1}的嵌套列表我坚持如何正确地写这个。

Answer 1

所以我们想要进行逐行计算，并将其定义为新列o

mapply绝对是正确的函数系列，但mapply（和sapply）会在返回之前简化其输出。 data.table喜欢名单。 Map只是mapply(..., simplify = FALSE)的表达快捷方式，不会修改返回。

以下是我们之后的计算，但它仍然不太正确。（data.table将列表输出解释为单独的列）

> dt[, Map(sub, l, '', n)]
    apple      ball          cat
1: I ate  I played   ate pudding

所以我们想进一步将它包装在一个列表中以获得我们所追求的输出：

>dt[, .(Map(sub, l, '', n))]
             V1
1:       I ate 
2:    I played 
3:  ate pudding

现在我们可以使用:=

分配此内容

> dt[, o := Map(sub, l, '', n)]
> dt
       l m               n            o
1: apple 1     I ate apple       I ate 
2:  ball 2   I played ball    I played 
3:   cat 3 cat ate pudding  ate pudding

编辑：正如所指出的，这会导致o成为列表列。

我们可以通过使用标准mapply来避免这种情况，尽管我更倾向于使用Map的“一刀切”方法（每行创建一个输出，它在列表中。无论输出是什么样的，这总是有效，然后我们可以在最后进行类型转换。）

dt[, o := mapply(sub, l, '', n)]

Answer 2

我们可以通过paste＆＃39; l＆＃39;的内容进行矢量化方法，在pattern中将其用作sub参数来删除子字符串并创建新专栏＆＃39; o＆＃39;

dt[, o := trimws(sub(paste(l, collapse="|"), "", n))]
dt
#       l m               n           o
#1: apple 1     I ate apple       I ate
#2:  ball 2   I played ball    I played
#3:   cat 3 cat ate pudding ate pudding

如何将不同的多参数函数应用于data.table的每一行？

2 个答案: