R:删除字符串中的部分单词

时间:2017-12-14 10:10:50

标签: r regex gsub tm stringr

我有一个角色向量

words <- c("somethingspan.", "..span?", "spanthank", "great to hear", "yourspan")

我正在尝试从矢量中的每个单词中删除span AND标点符号

> something thank great to hear your

问题是,如果span出现在我感兴趣的词之前或之后,则没有规则。此外,span可以粘贴到:i)仅限字符(例如{{1} }}),仅标点符号(例如yourspan)或字符和标点符号(例如..span?)。

我搜索了SO的答案,但通常我看到请求删除整个单词(如here)或字母/标点符号之后/之前的字符串元素(如here

任何帮助将不胜感激

3 个答案:

答案 0 :(得分:2)

您可以使用

[[:punct:]]*span[[:punct:]]*

请参阅regex demo

<强>详情

  • [[:punct:]]* - 0+标点字符
  • span - 文字子字符串
  • [[:punct:]]* - 0+标点字符

R Demo

words <- c("somethingspan.", "..span?", "spanthank", "great to hear", "yourspan")
words <- gsub("[[:punct:]]*span[[:punct:]]*", "", words) # Remove spans
words <- words[words != ""] # Discard empty elements
paste(words, collapse=" ")  # Concat the elements
## => [1] "something thank great to hear your"

如果删除不需要的字符串后只有空格元素,则可以使用words <- words[trimws(words) != ""](而不是words[words != ""])替换第二步。

答案 1 :(得分:1)

https://regex101.com/在这里你可以尝试一切。

clean_words<- gsub(pattern = "span",replacement = "",words, perl = T)
# if you want the sentence
sentence<-paste(clean_words, sep = " ", collapse = " ")

# to remove punctuation this regex only takes from A to z
clean_sentence<- gsub(pattern = "[^a-zA-Z ]",replacement = "",sentence, perl = T)

答案 2 :(得分:0)

使用sub删除范围。要将其设为句子,请使用pastecollapse

library(magrittr)

sub("^[[:punct:]]{,2}span|span[[:punct:]]{,2}$", "", words)  %>% paste(collapse=" ")

所以它只删除开头或结尾的跨度。

输出

[1] "something ? thank great to hear your"