我想使用单个正则表达式从字符串中提取少量数据。我制作了一个模式,其中包括这些部分作为括号中的子表达式。在类似perl的环境中,我只是通过像myvar1=$1; myvar2=$2;
等代码将这些子表达式传递给变量 - 但是如何在R中执行此操作?
目前,我发现访问这些事件的唯一方法是通过regexec。这不是很方便,因为regexec不支持perl语法和其他原因。这就是我现在要做的事情:
getoccurence <- function(text,rex,n) { # rex is the result of regexec function
occstart <- rex[[1]][n+1]
occstop <- occstart+attr(rex[[1]],'match.length')[n+1]-1
occtext <- substr(text,occstart[i],occstop)
return(occtext)
}
mytext <- "junk text, 12.3456, -01.234, valuable text before comma, all the rest"
mypattern <- "([0-9]+\\.[0-9]+), (-?[0-9]+\\.[0-9]+), (.*),"
rez <- regexec(mypattern, mytext)
var1 <- getoccurence(mytext, rez, 1)
var2 <- getoccurence(mytext, rez, 2)
var3 <- getoccurence(mytext, rez, 3)
显然,这是一个非常笨拙的解决方案,应该有更好的东西。我会很感激任何建议。
答案 0 :(得分:2)
你看过regmatches
吗?
> regmatches(mytext, rez)
[[1]]
[1] "12.3456, -01.234, valuable text before comma," "12.3456"
[3] "-01.234" "valuable text before comma"
> sapply(regmatches(mytext, rez), function(x) x[4])
[1] "valuable text before comma"
答案 1 :(得分:1)
在stringr
中,这是str_match
或str_match_all
(如果您希望匹配字符串中每个模式的匹配项。str_match
返回一个矩阵{{1返回一个矩阵列表
str_match_all
答案 2 :(得分:1)
strapply
和strapplyc
可以一步完成:
> strapplyc(mytext, mypattern)
[[1]]
[1] "12.3456" "-01.234"
[3] "valuable text before comma"
> # with simplify = c argument
> strapplyc(mytext, mypattern, simplify = c)
[1] "12.3456" "-01.234"
[3] "valuable text before comma"
> # extract second element only
> strapply(mytext, mypattern, ... ~ ..2)
[[1]]
[1] "-01.234"
> # specify function slightly differently and use simplify = c
> strapply(mytext, mypattern, ... ~ list(...)[2], simplify = c)
[1] "-01.234"
> # same
> strapply(mytext, mypattern, x + y + z ~ y, simplify = c)
[1] "-01.234"
> # same but also convert to numeric - also can use with other variations above
> strapply(mytext, mypattern, ... ~ as.numeric(..2), simplify = c)
[1] -1.234
在上面的例子中,第三个参数可以是一个函数,或者在示例中是一个转换为函数的公式(LHS表示参数,RHS是正文)。