正则表达式运算符删除多个字符串

时间:2017-10-27 11:54:07

标签: r

使用中包的字符串

我想在"之前删除所有字符串:"或" |"但我的代码输出并没有给我预期的输出。

以下是样本数据:

x <- c("Q3: AGE", "Q4: COUNTRY", "Q5: STATE, PROVINCE, COUNTY, ETC", 
"Q6 | 100 Grand Bar", "Q6 | Anonymous brown globs that come in black and 
orange wrappers\t(a.k.a. Mary Janes)", 
"Q6 | Any full-sized candy bar", "Q6 | Black Jacks")

以下是我的R代码:

x %>% 
str_replace_all("(.*: | .*\\|)", "")

以下是我的预期结果:

x <- c("AGE", "COUNTRY", "STATE, PROVINCE, COUNTY, ETC", 
"100 Grand Bar", "Anonymous brown globs that come in black and orange 
wrappers\t(a.k.a. Mary Janes)", 
"Any full-sized candy bar", "Black Jacks")

3 个答案:

答案 0 :(得分:1)

这是另一个正则表达式:

gsub("^.*?(: |\\ |)", "", x) 

gsub("^.*?(:|\\|) ", "", x)

gsub("^.*?(:|\\|) ?", "", x) #if the vector contains mixed `:text`, `| text` without and with spaces
#output
[1] "AGE"                                                                                        
[2] "COUNTRY"                                                                                    
[3] "STATE, PROVINCE, COUNTY, ETC"                                                               
[4] "100 Grand Bar"                                                                              
[5] "Anonymous brown globs that come in black and \norange wrappers\t(a.k.a. Mary Janes)"
[6] "Any full-sized candy bar"                                                                   
[7] "Black Jacks"  

^.*? - 匹配字符串开头的最少字符数 (: |\\| ) - :|

答案 1 :(得分:0)

我们可以使用sub来匹配来自开头的:|[^:|]*)的零个或多个字符(^)字符串后跟:或(||(因为它是一个元字符意为OR而将其转义)后跟零个或多个空格(\\s*)和将其替换为空白(""

sub("^[^:|]*(:|\\|)\\s*", "", x)
#[1] "AGE"                                                                               
#[2] "COUNTRY"                                                                           
#[3] "STATE, PROVINCE, COUNTY, ETC"                                                      
#[4] "100 Grand Bar"                                                                     
#[5] "Anonymous brown globs that come in black and \norange wrappers\t(a.k.a. Mary Janes)"
#[6] "Any full-sized candy bar"                                                          
#[7] "Black Jacks"           

答案 2 :(得分:0)

这是一种非正则表达式方法,

unlist(sapply(strsplit(x, ': | [|] '), function(i) paste(trimws(i[-1]), collapse = ' ')))

#[1] "AGE"                                                                                      
#[2] "COUNTRY"                                                                                  
#[3] "STATE, PROVINCE, COUNTY, ETC"                                                             
#[4] "100 Grand Bar"                                                                            
#[5] "Anonymous brown globs that come in black and \n       orange wrappers\t(a.k.a. Mary Janes)"
#[6] "Any full-sized candy bar"                                                                 
#[7] "Black Jacks"

#or with a slightly different regex than @akrun's solution,

sub('Q[0-9]+: |Q[0-9]+ \\| ', '', x)