我有一个包含两个字符串变量的数据集。两者都包含我想逐字比较的句子。我想创建一个新列(“ new_var”),该列应如下所示:
var1 var2 new_var
"sentence numer one" "setence numer two" sentence:setence + one:two
"another one is here" "aner one are hre" another:aner + is:are + here:hre
我不知道如何编写适用于数据集的代码:根据条件和循环添加新列。只有当我像这样定义对象var1和var2时,我的代码才能工作。
library(stringr)
var1 = "this is sentence numer one"
var2 = "this is setence numer two"
new_var <- for (i in 1:(lengths(gregexpr("\\s+", var1)) + 1)) {
if (word(string = var1, start = i, end = i) != word(string=var2, start=i, end=i))
{
cat(word(string = var1, start = i, end = i), word(string = var2, start = i, end = i), "+", sep=":")
} else {
cat("")
}
}
答案 0 :(得分:1)
一种可能性是先使用str_split
软件包中的map2
,然后再使用purrr
。
首先,我创建一些伪数据:
x <- c("sentence number one", "another one is here")
y <- c("setence number two", "aner one are hre")
然后我将其转换:
x2 <- str_split(x, " ")
y2 <- str_split(y, " ")
library(purrr)
map2(x2, y2, ~ifelse(.x == .y, "", paste(.x, .y, sep = ":")))
[[1]]
[1] "sentence:setence" "" "one:two"
[[2]]
[1] "another:aner" "" "is:are" "here:hre"