我想用一次又一次的单词替换所有具有一个接一个地重复的单词的字符串。
我的字符串是这样的:
text_strings <- c("We have to extract these numbers 12, 47, 48", "The integers numbers are also interestings: 189 2036 314",
"','is a separator, so please extract these numbers 125,789,1450 and also these 564,90456", "We like to to offer you 7890$ per month in order to complete this task... we are joking", "You are going to learn 3 things, the first one is not to extract, and 2 and 3 are simply digits.", "Have fun with our mighty test, you are going to support science, progress, mankind wellness and you are going to waste 30 or 60 minutes of your life.", "you can also extract exotic stuff like a456 gb67 and 45678911ghth", "Writing 1 example is not funny, please consider that 66% is validation+testing", "You you are a genius, I think that you like arrays A LOT, [3,45,67,900,1974]", "Who loves arrays more than me?", "{366,78,90,5}Yes, there are only 4 numbers inside", "Integers are fine but sometimes you like 99 cents after the 99 dollars", "100€ are better than 99€", "I like to give you 1000 numbers now: 12 3 56 21 67, and more, [45,67,7]", "Ok ok 1 2 3 4 5 and the last one is 6", "33 trentini entrarono a Trento, tutti e 33 di tratto in tratto trotterellando")
我试过了:
gsub("\b(?=\\w*(\\w)\1)\\w+", "\\w", text_strings, perl = TRUE)
但没有任何事情发生(输出保持不变)。
如何删除重复的单词,例如
text_strings[9]
#[1] "You you are a genius, I think that you like arrays A LOT, [3,45,67,900,1974]"
谢谢!
答案 0 :(得分:2)
您可以使用gsub
和正则表达式。
gsub("\\b(\\w+)\\W+\\1", "\\1", text_strings, ignore.case=TRUE, perl=TRUE)
[1] "We have to extract these numbers 12, 47, 48"
[2] "The integers numbers are also interestings: 189 2036 314"
[3] "','is a separator, so please extract these numbers 125,789,1450 and also these 564,90456"
[4] "We like to offer you 7890$ per month in order to complete this task... we are joking"
[5] "You are going to learn 3 things, the first one is not to extract, and 2 and 3 are simply digits."
[6] "Have fun with our mighty test, you are going to support science, progress, mankind wellness and you are going to waste 30 or 60 minutes of your life."
[7] "you can also extract exotic stuff like a456 gb67 and 45678911ghth"
[8] "Writing 1 example is not funny, please consider that 66% is validation+testing"
[9] "You are a genius, I think that you like arrays A LOT, [3,45,67,900,1974]"
[10] "Who loves arrays more than me?"
[11] "{366,78,90,5}Yes, there are only 4 numbers inside"
[12] "Integers are fine but sometimes you like 99 cents after the 99 dollars"
[13] "100€ are better than 99€"
[14] "I like to give you 1000 numbers now: 12 3 56 21 67, and more, [45,67,7]"
[15] "Ok 1 2 3 4 5 and the last one is 6"
[16] "33 trentini entrarono a Trento, tutti e 33 di tratto in tratto trotterellando
“