R:grep()模式是一个向量

时间:2016-12-15 14:38:47

标签: r design-patterns grep grepl

我有两个数据框,df 1包含我一直使用的一小部分化合物,我想看看哪些化合物也是df 2的一部分。我遇到的问题是df2中的许多值都有不完全匹配我在df1中的值(例如df1 =" Altretamin",df2 =" Altretamine"或" Altretamin Hydrochloride"或" ALTRETAMIN HCL" )。为了避免这些匹配和大/小写问题,我想使用ifelse / grepl / grep语法,但grep本身在循环中达到NULL时出现问题,并且当试图用ifelse语句绕过它时,我得到以下错误:

   df1 <- data.frame(CompoundName = c("Bosutinib", "Nilotinib", "Cabozantinib", "Altretamine"))
   df2 <- data.frame(CompoundName = c("Bosutinib", "Nilotinib", "ALTRETAMINE HCL", "Masitinib"))

    index <- NULL
    for (i in 1:length(df1$CompoundName)) {
      index[i] <- 
        ifelse(grepl(df1$CompoundName[i], df2$CompoundName, ignore.case = TRUE), 
               grep(df1$CompoundName[i], df2$CompoundName, ignore.case = TRUE), 0)
      print(index[i])
    }
    index

哪个给出了

[1] 1
[1] 0
[1] 0
[1] 0
Warning messages:
1: In index[i] <- ifelse(grepl(df1$CompoundName[i], df2$CompoundName,  :
  number of items to replace is not a multiple of replacement length
2: In index[i] <- ifelse(grepl(df1$CompoundName[i], df2$CompoundName,  :
  number of items to replace is not a multiple of replacement length
3: In index[i] <- ifelse(grepl(df1$CompoundName[i], df2$CompoundName,  :
  number of items to replace is not a multiple of replacement length
4: In index[i] <- ifelse(grepl(df1$CompoundName[i], df2$CompoundName,  :
  number of items to replace is not a multiple of replacement length
> index
[1] 1 0 0 0

到目前为止,我认为grepl为每个&#34; i&#34;提供了一个FALSE或TRUE语句的向量。但是不只是给我一个TRUE或FALSE值,我可以在循环中使用它。有没有办法绕过这些问题?或者只是另一种方式来匹配不精确的模式?

0 个答案:

没有答案