我有两个数据框,df 1包含我一直使用的一小部分化合物,我想看看哪些化合物也是df 2的一部分。我遇到的问题是df2中的许多值都有不完全匹配我在df1中的值(例如df1 =" Altretamin",df2 =" Altretamine"或" Altretamin Hydrochloride"或" ALTRETAMIN HCL" )。为了避免这些匹配和大/小写问题,我想使用ifelse / grepl / grep语法,但grep本身在循环中达到NULL时出现问题,并且当试图用ifelse语句绕过它时,我得到以下错误:
df1 <- data.frame(CompoundName = c("Bosutinib", "Nilotinib", "Cabozantinib", "Altretamine"))
df2 <- data.frame(CompoundName = c("Bosutinib", "Nilotinib", "ALTRETAMINE HCL", "Masitinib"))
index <- NULL
for (i in 1:length(df1$CompoundName)) {
index[i] <-
ifelse(grepl(df1$CompoundName[i], df2$CompoundName, ignore.case = TRUE),
grep(df1$CompoundName[i], df2$CompoundName, ignore.case = TRUE), 0)
print(index[i])
}
index
哪个给出了
[1] 1
[1] 0
[1] 0
[1] 0
Warning messages:
1: In index[i] <- ifelse(grepl(df1$CompoundName[i], df2$CompoundName, :
number of items to replace is not a multiple of replacement length
2: In index[i] <- ifelse(grepl(df1$CompoundName[i], df2$CompoundName, :
number of items to replace is not a multiple of replacement length
3: In index[i] <- ifelse(grepl(df1$CompoundName[i], df2$CompoundName, :
number of items to replace is not a multiple of replacement length
4: In index[i] <- ifelse(grepl(df1$CompoundName[i], df2$CompoundName, :
number of items to replace is not a multiple of replacement length
> index
[1] 1 0 0 0
到目前为止,我认为grepl为每个&#34; i&#34;提供了一个FALSE或TRUE语句的向量。但是不只是给我一个TRUE或FALSE值,我可以在循环中使用它。有没有办法绕过这些问题?或者只是另一种方式来匹配不精确的模式?