争论'哪个'不合逻辑,grepl

时间:2017-06-19 09:38:54

标签: r

Column_A            Column_B
lehman electronics  "simplifying technology home the lehman world is now digital."
levan group         "the levan group  \n \n      home   dlocation aspx \t"
life botanica       "of denton  txt life botanica"

我的目标是将Colum_A的第一个单词与Column_B的整个字符串匹配,如果匹配,则返回"匹配"。

我尝试过以下代码:

matchColumn <- function(dataColumn, searchColumn)
{ 
  desc <- searchColumn[which(grepl(unlist(strsplit(dataColumn," "))[1], searchColumn))]
  desc <- ifelse(length(desc) == 0, NA, "Match") 
  return(desc) 
}
file_new1$CombinationMatch <- sapply(file_new1$Column_A, matchColumn, file_new1$Column_B)

但它给我带来了奇怪的错误

  

其中(grepl(unlist(strsplit(dataColumn,&#34;&#34;))[1],searchColumn)):
  争论&#39;哪个&#39;不合逻辑

1 个答案:

答案 0 :(得分:2)

这是一个修改后的版本,它会返回我认为您之后的内容:

matchColumn <- function(dataColumn, searchColumn) {

    # your indexing was off here 
    keyword <- sapply(strsplit(dataColumn, " "), function(i) i[1])

    # you don't need which here, but you do need ignore.case = TRUE
    desc <- sapply(seq_along(keyword), function(x) grepl(keyword[x], searchColumn[x], ignore.case = TRUE))

    ifelse(desc, "Match", NA) 

}

然后只是:

file_new1$CombinationMatch <- with(file_new1, matchColumn(Column_A, Column_B))

> file_new1
           Column_A                                                     Column_B CombinationMatch
1 Leman Electronics simplifying technology home the lehman world is now digital.             <NA>
2       Levan Group          the levan group  \n \n      home   dlocation aspx \t            Match
3     life botanica                                 of denton  txt life botanica            Match