我的数据表有两个文本列(col1
和col2
)。两者都有句子。我想查找col1
中col2
中的所有字词,并返回包含col1
中的字词的字符串减去col2
中找到的字词。以下是一个例子
col1 | col2 | output
america, uk have too much money | uk, uk money too too | america, have much
答案 0 :(得分:1)
这样的事情?
DT <- data.table(col1 <- "america, uk have too much money", col2 <- "uk, uk money too too")
DT[, output := paste(strsplit(DT[,col1], "(\\s+)|(?!')(?=[[:punct:]])", perl = TRUE)[[1]][!(strsplit(DT[,col1],"(\\s+)|(?!')(?=[[:punct:]])", perl = TRUE)[[1]] %in% strsplit(DT[,col2], "(\\s+)|(?!')(?=[[:punct:]])", perl = TRUE)[[1]])], collapse = " ")]
虽然没有逗号