我想要一个函数,它将在字符串中查找函数实例,提取原始参数,并用占位符替换它们。 不幸的是,我的正则表达能力并没有让我走得太远......
我想要以下行为:
extract_fun("max(7*xy,b=z)+maximum+max(j)",fun="max")
# $modified_string
# [1] "{F[[1]]}+maximum+{F[[2]]}"
#
# $params
# $params[[1]]
# [1] "7*xy" "b=z"
#
# $params[[2]]
# [1] "j"
编辑:
更复杂的用例:
extract_fun("max(7*xy,b=min(1,3))+maximum+max(j)",fun="max")
# $modified_string
# [1] "{F[[1]]}+maximum+{F[[2]]}"
#
# $params
# $params[[1]]
# [1] "7*xy" "b=min(1,3)"
#
# $params[[2]]
# [1] "j"
答案 0 :(得分:2)
这是让你入门的东西:
你的函数应该有两个参数:
fun = "max"
string = "max(7*xy,b=z)+maximum+max(j)"
正则表达式捕获(
,)
中的任何内容,前面有fun
,它是懒惰的?
regex = paste0(fun, "\\((.*?)\\)")
regex
#output
"max\\((.*?)\\)"
matcher = stringr::str_match_all(string, regex)
matcher = do.call(rbind, matcher)
matcher
#output
[,1] [,2]
[1,] "(7*xy,b=z)" "7*xy,b=z"
[2,] "(j)" "j"
#extract arguments from captured groups in matcher[,2]
params = strsplit(matcher[,2], " {0,}, {0,}" ) #, with possible white spaces before and after
#output
[[1]]
[1] "7*xy" "b=z"
[[2]]
[1] "j"
#generate a modified_string
Fs = 1:nrow(matcher)
replacer = paste0("{F[[", Fs, "]]}")
regex2 = paste(matcher[,1])
out = string
for (i in 1:length(replacer)){
out= gsub(regex2[i], replacer[i], out , fixed = TRUE)
}
out
#output
"{F[[1]]}+maximum+{F[[2]]}"
编辑:这是我到目前为止在更新的问题上的内容:
我的想法是用感兴趣的函数来隔离字符串的一部分,而不是仅操纵这部分。
string = "max(7*xy,b=min(1,3))+maximum+max(j)"
在max(
fun = "max"
regex_for_split = paste0("(?<=.)(?=", fun, "\\()")
fun_char = nchar(fun)
spliter_begin = unlist(strsplit(string, regex_for_split, perl = TRUE))
找到开头和结尾括号
opening = stringr::str_locate_all(spliter_begin, "\\(")
ending = stringr::str_locate_all(spliter_begin, "\\)")
稍微清理一下
opening = lapply(opening, function(x){
return((x[,1]))
})
ending = lapply(ending, function(x){
return((x[,1]))
})
找到等于打开的括号数量的结束括号的位置。我们对第一场比赛感兴趣。
out = list()
for (i in 1: length(ending)){
end = ending[[i]]
open = opening[[i]]
sumer = vector()
for(z in end){
sumi= sum(open < z) == sum(end<=z)
sumer = c(sumer, sumi)
}
out[[i]] = sumer
}
spliter_end = purrr::map2(ending, out, function(x, y){
return(x[y])
})
隔离子字符串
fun_isolate = purrr::map2(as.list(spliter_begin), spliter_end, function(x,y){
substr(x, start = fun_char+2, stop = y[1]-1)
})
fun_isolate
#output
[[1]]
[1] "7*xy,b=min(1,3)"
[[2]]
[1] "j"
让我们尝试使用更简单的例子
string2 = "max(7*xy,b=min(1,3),z=sum(x*y)),mean(x+y)+maximum+max(j)"
#copy above code with `string2` instead of `string`
fun_isolate
[[1]]
[1] "7*xy,b=min(1,3),z=sum(x*y)"
[[2]]
[1] "j"
甚至更强硬:
string3 = "max(7*xy,b=min(1,3, head(z)),z=sum(x*y+mean(x+y))),mean(x+y)+maximum+max(j)"
#output
[[1]]
[1] "7*xy,b=min(1,3, head(z)),z=sum(x*y+mean(x+y))"
[[2]]
[1] "j"
现在只需,
分割(
)
即可。
#locate strings in parenthesis
locate_exclude = stringr::str_locate_all(unlist(fun_isolate), "\\(.*?\\)")
#locate all comas
locate_comma = stringr::str_locate_all(unlist(fun_isolate), ",")
#leave the difference
splt_locate = purrr::map2(locate_exclude, locate_comma, function(x, y){
if(length(x)==0) x = matrix(data=c(0,0), nrow=1)
offbounds = vector()
for (i in 1 : nrow(x)){
z = x[i,1]:x[i,2]
offbounds = c(offbounds, z)
}
comas = y[,1]
return(comas[!comas%in%offbounds])
})
#function to split string at indexes
split_str_by_index <- function(target, index) {
index <- sort(index)
substr(rep(target, length(index) + 1),
start = c(1, index),
stop = c(index -1, nchar(target)))
}
close_but_not_yet = purrr::map2(fun_isolate, splt_locate, function(x, y){
split_str_by_index(x, y)
})
close_but_not_yet
#output
[[1]]
[1] "7*xy" ",b=min(1,3, head(z))" ",z=sum(x*y+mean(x+y))"
[[2]]
[1] "j"
如果有的话,只需删除字符串开头的,
即可。例如:
lapply(close_but_not_yet , function(x) gsub("^, {0,}", "",x))
#output
[[1]]
[1] "7*xy" "b=min(1,3, head(z))" "z=sum(x*y+mean(x+y))"
[[2]]
[1] "j"
如果在其自身内调用相同的函数,它将无法工作:
"max(7*xy,b=min(1,3),z=max(x*y)),mean(x+y)+maximum+max(j)"
但如果您在(
示例中的第一个strsplit之前从)
,
排除所有内容,那么即使这样也是可以管理的。
测试:
"max(7*xy,b=min(1,3, head(z)),z=sum(x*y+mean(x+y))),mean(x+y)+maximum+max(j)"
"max(7*xy,b=min(1,3, head(z)),z=sum(x*y+mean(x+y))),mean(x+y)+maximum+max(j*z+sum(a*b^sum(z)), drop = 72)"
"max(7*xy,b=min(1,3, head(z)),z=sum(x*y, mean(x+y))),mean(x+y)+maximum+max(j*z+sum(a*b^sum(z)), drop = 72)"