从R中的字符串中删除指定的模式

时间:2016-06-21 15:20:58

标签: regex r text

我有一个像以下

的字符串
s <- "abc a%bc 1.2% 234 1.2 (1.4%)) %3ed"

我想删除所有包含%的“单词”。所以结果将是

"abc 234 1.2"

4 个答案:

答案 0 :(得分:4)

您可以使用

> gsub("^\\s+|\\s+$", "", (gsub("\\s+", " " ,gsub("\\s+\\S*%\\S*(?=\\s+|$)", " ",input, perl=TRUE))))
#[1] "abc 234 1.2"

代码细分

gsub("^\\s+|\\s+$", "", (gsub("\\s+", " " ,gsub("\\s+\\S*%\\S*(?=\\s+|$)", " ",input, perl=TRUE))))
                                           <--------------------------------------------------->
                                                     Remove strings with %
                        <------------------------------------------------------------------------>
                        Substitute extra spaces with single space from resultant string obtained from above
<-------------------------------------------------------------------------------------------------->
      Trim initial and final whitespaces from the string obtained from above

正则表达式细分

\\s+ #Match whitespaces
\\S* #Match all non whitespace character before % if its there
% #Match % literally
\\S* #Match all non whitespace character after % if its there
(?=\\s+|$) #Lookahead to check whether there is a space or end of string after matching word with %

答案 1 :(得分:2)

你可以用这个

library(stringr)
s <- "abc a%bc 1.2% 234 1.2 (1.4%)) %3ed"
words<-unlist(str_split(s," "))
ind<-which(is.na(str_locate(unlist(str_split(s," ")),"%")[,1]))
vec<-words[ind]
res<-paste(vec, collapse = ' ')
res

答案 2 :(得分:2)

您还可以使用str_extract_all包中的stringr

stringr::str_extract_all(s, "(?<=^|\\s)[^%\\s]+(?=\\s|$)")
[[1]]
[1] "abc" "234" "1.2"

(?<=^|\\s)代表在字符串的开头或白色空格后面;

[^%\\s]+匹配不包含%和空格的单词;

(?=\\s|$)代表在字符串末尾或空格之前;

答案 3 :(得分:2)

使用基数R的这种简单方法怎么样:

s <- "abc a%bc 1.2% 234 1.2 (1.4%)) %3ed"
spl <- unlist(strsplit(s, " "))
spl[!grepl("%", spl)]

#[1] "abc" "234" "1.2"