我有以下数据框:
sent <- data.frame(words = c("just right size", "size love quality", "laptop worth price", "price amazing user",
"explanation complex what", "easy set", "product best buy", "buy priceless when"), user = c(1,2,3,4,5,6,7,8))
发送的数据框导致:
words user
just right size 1
size love quality 2
laptop worth price 3
price amazing user 4
explanation complex what 5
easy set 6
product best buy 7
buy priceless when 8
我需要在跟随句子的开头删除单词,这与前一句末尾的单词相同。
我的意思是,例如。我们有句子“恰到好处尺寸”和“尺寸爱情品质”,所以我需要在第二个用户位置删除尺寸字样。 然后句子“笔记本电脑价值价格”和“价格惊人的用户”,所以我需要删除第四个用户可能的单词价格。
任何人都可以帮助我,我将非常感谢你的帮助。非常感谢你提前。
答案 0 :(得分:0)
你可以先提取&#34;&#34;和&#34;最后&#34;来自&#34;单词&#34;使用sub
的后续行和当前行的列。如果单词相同,则从后续行中删除第一个单词,或者保持原样(ifelse(...)
)
w1 <- sub(' .*', '', sent$words[-1])
w2 <- sub('.* ', '', sent$words[-nrow(sent)])
sent$words <- as.character(sent$words)
sent$words
#[1] "just right size" "size love quality"
#[3] "laptop worth price" "price amazing user"
#[5] "explanation complex what" "easy set"
#[7] "product best buy" "buy priceless when"
sent$words[-1] <- with(sent, ifelse(w1==w2, sub('\\w+ ', '',words[-1]),
words[-1]))
sent$words
#[1] "just right size" "love quality"
#[3] "laptop worth price" "amazing user"
#[5] "explanation complex what" "easy set"
#[7] "product best buy" "priceless when"