使用中包的字符串
我想在"之前删除所有字符串:"或" |"但我的代码输出并没有给我预期的输出。
以下是样本数据:
x <- c("Q3: AGE", "Q4: COUNTRY", "Q5: STATE, PROVINCE, COUNTY, ETC",
"Q6 | 100 Grand Bar", "Q6 | Anonymous brown globs that come in black and
orange wrappers\t(a.k.a. Mary Janes)",
"Q6 | Any full-sized candy bar", "Q6 | Black Jacks")
以下是我的R代码:
x %>%
str_replace_all("(.*: | .*\\|)", "")
以下是我的预期结果:
x <- c("AGE", "COUNTRY", "STATE, PROVINCE, COUNTY, ETC",
"100 Grand Bar", "Anonymous brown globs that come in black and orange
wrappers\t(a.k.a. Mary Janes)",
"Any full-sized candy bar", "Black Jacks")
答案 0 :(得分:1)
这是另一个正则表达式:
gsub("^.*?(: |\\ |)", "", x)
或
gsub("^.*?(:|\\|) ", "", x)
或
gsub("^.*?(:|\\|) ?", "", x) #if the vector contains mixed `:text`, `| text` without and with spaces
#output
[1] "AGE"
[2] "COUNTRY"
[3] "STATE, PROVINCE, COUNTY, ETC"
[4] "100 Grand Bar"
[5] "Anonymous brown globs that come in black and \norange wrappers\t(a.k.a. Mary Janes)"
[6] "Any full-sized candy bar"
[7] "Black Jacks"
^.*?
- 匹配字符串开头的最少字符数
(: |\\| )
- :
或|
答案 1 :(得分:0)
我们可以使用sub
来匹配来自开头的:
或|
([^:|]*
)的零个或多个字符(^
)字符串后跟:
或(|
)|
(因为它是一个元字符意为OR而将其转义)后跟零个或多个空格(\\s*
)和将其替换为空白(""
)
sub("^[^:|]*(:|\\|)\\s*", "", x)
#[1] "AGE"
#[2] "COUNTRY"
#[3] "STATE, PROVINCE, COUNTY, ETC"
#[4] "100 Grand Bar"
#[5] "Anonymous brown globs that come in black and \norange wrappers\t(a.k.a. Mary Janes)"
#[6] "Any full-sized candy bar"
#[7] "Black Jacks"
答案 2 :(得分:0)
这是一种非正则表达式方法,
unlist(sapply(strsplit(x, ': | [|] '), function(i) paste(trimws(i[-1]), collapse = ' ')))
#[1] "AGE"
#[2] "COUNTRY"
#[3] "STATE, PROVINCE, COUNTY, ETC"
#[4] "100 Grand Bar"
#[5] "Anonymous brown globs that come in black and \n orange wrappers\t(a.k.a. Mary Janes)"
#[6] "Any full-sized candy bar"
#[7] "Black Jacks"
#or with a slightly different regex than @akrun's solution,
sub('Q[0-9]+: |Q[0-9]+ \\| ', '', x)