一个是目标数据框(targetframe
),另一个数据框用作具有一些键值的库(word.library
)。然后我需要以下算法:算法应查找word.library$mainword
和targetframe$words
之间的近似匹配。在计算出近似匹配后,targetframe $ words中的子串应替换为word.library$keyID
。
以下是上述两个数据框:
tragetframe <- data.frame(words= c("This is sentence one with the important word",
"This is sentence two with the inportant woord",
"This is sentence three with crazy sayings" ))
word.library <- data.frame(mainword = c("important word",
"crazy sayings"),
keyID = c("1001",
"2001"))
这是我的解决方案。
for(i in 1:nrow(word.library)){
positions <- aregexec(word.library[i,1], tragetframe$words, max.distance = 0.1)
res <- regmatches(tragetframe$words, positions)
res[lengths(res)==0] <- "XXXX" # deal with 0 length matches somehow
tragetframe$words <- Vectorize(gsub)(unlist(res), word.library[i,2], tragetframe$words)
tragetframe$words
}
但是:我使用了一个非常有效的for循环(假设我有两个巨大的数据帧)。有谁知道如何更有效地解决这个问题?