Question

我有两个数据框，df 1包含我一直使用的一小部分化合物，我想看看哪些化合物也是df 2的一部分。我遇到的问题是df2中的许多值都有不完全匹配我在df1中的值（例如df1 =＆＃34; Altretamin＆＃34;，df2 =＆＃34; Altretamine＆＃34;或＆＃34; Altretamin Hydrochloride＆＃34;或＆＃34; ALTRETAMIN HCL＆＃34; ）。为了避免这些匹配和大/小写问题，我想使用ifelse / grepl / grep语法，但grep本身在循环中达到NULL时出现问题，并且当试图用ifelse语句绕过它时，我得到以下错误：

   df1 <- data.frame(CompoundName = c("Bosutinib", "Nilotinib", "Cabozantinib", "Altretamine"))
   df2 <- data.frame(CompoundName = c("Bosutinib", "Nilotinib", "ALTRETAMINE HCL", "Masitinib"))

    index <- NULL
    for (i in 1:length(df1$CompoundName)) {
      index[i] <- 
        ifelse(grepl(df1$CompoundName[i], df2$CompoundName, ignore.case = TRUE), 
               grep(df1$CompoundName[i], df2$CompoundName, ignore.case = TRUE), 0)
      print(index[i])
    }
    index

哪个给出了

[1] 1
[1] 0
[1] 0
[1] 0
Warning messages:
1: In index[i] <- ifelse(grepl(df1$CompoundName[i], df2$CompoundName,  :
  number of items to replace is not a multiple of replacement length
2: In index[i] <- ifelse(grepl(df1$CompoundName[i], df2$CompoundName,  :
  number of items to replace is not a multiple of replacement length
3: In index[i] <- ifelse(grepl(df1$CompoundName[i], df2$CompoundName,  :
  number of items to replace is not a multiple of replacement length
4: In index[i] <- ifelse(grepl(df1$CompoundName[i], df2$CompoundName,  :
  number of items to replace is not a multiple of replacement length
> index
[1] 1 0 0 0

到目前为止，我认为grepl为每个＆＃34; i＆＃34;提供了一个FALSE或TRUE语句的向量。但是不只是给我一个TRUE或FALSE值，我可以在循环中使用它。有没有办法绕过这些问题？或者只是另一种方式来匹配不精确的模式？

R：grep（）模式是一个向量

0 个答案: