我对他们的同义词有所反对。在不同的数据框中,我有句子。我想从其他数据框中搜索同义词。如果找到,请将其替换为找到同义词的单词。
dt = read.table(header = TRUE,
text ="Word Synonyms
Use 'employ, utilize, exhaust, spend, expend, consume, exercise'
Come 'advance, approach, arrive, near, reach'
Go 'depart, disappear, fade, move, proceed, recede, travel'
Run 'dash, escape, elope, flee, hasten, hurry, race, rush, speed, sprint'
Hurry 'rush, run, speed, race, hasten, urge, accelerate, bustle'
Hide 'conceal, cover, mask, cloak, camouflage, screen, shroud, veil'
", stringsAsFactors= F)
mydf = read.table(header = TRUE, , stringsAsFactors= F,
text ="sentence
'I can utilize this file'
'I can cover these things'
")
所需的输出看起来像 -
I can Use this file
I can Hide these things
以上只是一个样本。在我的真实数据集中,我有超过10000个句子。
答案 0 :(得分:2)
可以用,
替换dt$Synonyms
中的|
,以便它可以用作pattern
的{{1}}参数。现在,使用gsub
作为模式,并用dt$Synonyms
替换任何单词的出现(由|
分隔)。可以使用dt$word
和sapply
作为:
已编辑:按照OP的建议添加了字边界检查(作为gsub
中模式的一部分)。
gsub
答案 1 :(得分:1)
这是一个tidyverse
解决方案......
library(stringr)
library(dplyr)
dt2 <- dt %>%
mutate(Synonyms=str_split(Synonyms, ",\\s*")) %>% #split into words
unnest(Synonyms) #this results in a long dataframe of words and synonyms
mydf2 <- mydf %>%
mutate(Synonyms=str_split(sentence, "\\s+")) %>% #split into words
unnest(Synonyms) %>% #expand to long form, one word per row
left_join(dt2) %>% #match synonyms
mutate(Word=ifelse(is.na(Word), Synonyms, Word)) %>% #keep unmatched words the same
group_by(sentence) %>%
summarise(sentence2=paste(Word, collapse=" ")) #reconstruct sentences
mydf2
sentence sentence2
<chr> <chr>
1 I can cover these things I can Hide these things
2 I can utilize this file I can Use this file