句子中的单词和词典中最近的邻居

时间:2015-02-05 10:36:17

标签: r

我有以下数据框:

sent <- data.frame(words = c("just right size", "size love quality", "laptop worth price", "price amazing user",
                         "explanation complex what", "easy set", "product best buy", "buy priceless when"), user = c(1,2,3,4,5,6,7,8))

发送的数据框导致:

words                          user
just right size                 1
size love quality               2
laptop worth price              3
price amazing user              4
explanation complex what        5
easy set                        6
product best buy                7
buy priceless when              8

我需要在跟随句子的开头删除单词,这与前一句末尾的单词相同。

我的意思是,例如。我们有句子“恰到好处尺寸”和“尺寸爱情品质”,所以我需要在第二个用户位置删除尺寸字样。 然后句子“笔记本电脑价值价格”和“价格惊人的用户”,所以我需要删除第四个用户可能的单词价格

任何人都可以帮助我,我将非常感谢你的帮助。非常感谢你提前。

1 个答案:

答案 0 :(得分:0)

你可以先提取&#34;&#34;和&#34;最后&#34;来自&#34;单词&#34;使用sub的后续行和当前行的列。如果单词相同,则从后续行中删除第一个单词,或者保持原样(ifelse(...)

w1 <- sub(' .*', '', sent$words[-1])
w2 <- sub('.* ', '', sent$words[-nrow(sent)])
sent$words <- as.character(sent$words)
sent$words
#[1] "just right size"          "size love quality"       
#[3] "laptop worth price"       "price amazing user"      
#[5] "explanation complex what" "easy set"                
#[7] "product best buy"         "buy priceless when"   

sent$words[-1] <- with(sent, ifelse(w1==w2, sub('\\w+ ', '',words[-1]), 
                  words[-1]))
sent$words
#[1] "just right size"          "love quality"            
#[3] "laptop worth price"       "amazing user"            
#[5] "explanation complex what" "easy set"                
#[7] "product best buy"         "priceless when"