我真的花时间学习正则表达式,而且我正在玩不同的玩具场景。我无法工作的一个设置是从字符串的开头抓取到n出现的字符,其中n> 1.
这里我可以从字符串的开头抓到第一个下划线,但我不能将它概括为第二个或第三个下划线。
x <- c("a_b_c_d", "1_2_3_4", "<_?_._:")
gsub("_.*$", "", x)
Here's what I'm trying to achieve with regex. (`sub`/`gsub`):
## > sapply(lapply(strsplit(x, "_"), "[", 1:2), paste, collapse="_")
## [1] "a_b" "1_2" "<_?"
#or
## > sapply(lapply(strsplit(x, "_"), "[", 1:3), paste, collapse="_")
## [1] "a_b_c" "1_2_3" "<_?_."
答案 0 :(得分:5)
这是一个开始。为了使其对于一般用途安全,您需要它来正确地转义正则表达式的特殊字符:
x <- c("a_b_c_d", "1_2_3_4", "<_?_._:", "", "abcd", "____abcd")
matchToNth <- function(char, n) {
others <- paste0("[^", char, "]*") ## matches "[^_]*" if char is "_"
mainPat <- paste0(c(rep(c(others, char), n-1), others), collapse="")
paste0("(^", mainPat, ")", "(.*$)")
}
gsub(matchToNth("_", 2), "\\1", x)
# [1] "a_b" "1_2" "<_?" "" "abcd" "_"
gsub(matchToNth("_", 3), "\\1", x)
# [1] "a_b_c" "1_2_3" "<_?_." "" "abcd" "__"
答案 1 :(得分:3)
怎么样:
gsub('^(.+_.+?).*$', '\\1', x)
# [1] "a_b" "1_2" "<_?"
或者,您可以使用{}
来表示重复次数......
sub('((.+_){1}.+?).*$', '\\1', x) # {0} will give "a", {1} - "a_b", {2} - "a_b_c" and so on
所以如果你想匹配第n个......你就不必重复自己......
答案 2 :(得分:1)
perl样式正则表达式中的第二个下划线:
/^(.?_.?_)/
和第三:
/^(.*?_.*?_.*?_)/
答案 3 :(得分:1)
也许是这样的
x
## [1] "a_b_c_d" "1_2_3_4" "<_?_._:"
gsub("(.*)_", "\\1", regmatches(x, regexpr("([^_]*_){1}", x)))
## [1] "a" "1" "<"
gsub("(.*)_", "\\1", regmatches(x, regexpr("([^_]*_){2}", x)))
## [1] "a_b" "1_2" "<_?"
gsub("(.*)_", "\\1", regmatches(x, regexpr("([^_]*_){3}", x)))
## [1] "a_b_c" "1_2_3" "<_?_."
答案 4 :(得分:1)
使用贾斯汀的方法,这就是我设计的:
beg2char <- function(text, char = " ", noc = 1, include = FALSE) {
inc <- ifelse(include, char, "?")
specchar <- c(".", "|", "(", ")", "[", "{", "^", "$", "*", "+", "?")
if(char %in% specchar) {
char <- paste0("\\", char)
}
ins <- paste(rep(paste0(char, ".+"), noc - 1), collapse="")
rep <- paste0("^(.+", ins, inc, ").*$")
gsub(rep, "\\1", text)
}
x <- c("a_b_c_d", "1_2_3_4", "<_?_._:")
beg2char(x, "_", 1)
beg2char(x, "_", 2)
beg2char(x, "_", 3)
beg2char(x, "_", 4)
beg2char(x, "_", 3, include=TRUE)