Column_A Column_B
lehman electronics "simplifying technology home the lehman world is now digital."
levan group "the levan group \n \n home dlocation aspx \t"
life botanica "of denton txt life botanica"
我的目标是将Colum_A的第一个单词与Column_B的整个字符串匹配,如果匹配,则返回"匹配"。
我尝试过以下代码:
matchColumn <- function(dataColumn, searchColumn)
{
desc <- searchColumn[which(grepl(unlist(strsplit(dataColumn," "))[1], searchColumn))]
desc <- ifelse(length(desc) == 0, NA, "Match")
return(desc)
}
file_new1$CombinationMatch <- sapply(file_new1$Column_A, matchColumn, file_new1$Column_B)
但它给我带来了奇怪的错误
其中(grepl(unlist(strsplit(dataColumn,&#34;&#34;))[1],searchColumn)):
争论&#39;哪个&#39;不合逻辑
答案 0 :(得分:2)
这是一个修改后的版本,它会返回我认为您之后的内容:
matchColumn <- function(dataColumn, searchColumn) {
# your indexing was off here
keyword <- sapply(strsplit(dataColumn, " "), function(i) i[1])
# you don't need which here, but you do need ignore.case = TRUE
desc <- sapply(seq_along(keyword), function(x) grepl(keyword[x], searchColumn[x], ignore.case = TRUE))
ifelse(desc, "Match", NA)
}
然后只是:
file_new1$CombinationMatch <- with(file_new1, matchColumn(Column_A, Column_B))
> file_new1
Column_A Column_B CombinationMatch
1 Leman Electronics simplifying technology home the lehman world is now digital. <NA>
2 Levan Group the levan group \n \n home dlocation aspx \t Match
3 life botanica of denton txt life botanica Match