Question

我有一个数学表达式，例如：

((2-x+3)^2+(x-5+7)^10)^0.5

我需要将^符号替换为C语言的pow函数。我认为正则表达式是我所需要的，但我不像专业人士那样了解正则表达式。所以我最终得到了这个正则表达式：

(\([^()]*)*(\s*\([^()]*\)\s*)+([^()]*\))*

我不知道如何改善这一点。你能建议我解决这个问题吗？

预期产出：

pow(pow(2-x+3,2)+pow(x-5+7,10),0.5)

Answer 1

关于R最奇妙的事情之一就是你可以用R轻松操纵R表达式。在这里，我们递归遍历你的表达式并用^替换pow的所有实例：

f <- function(x) {
  if(is.call(x)) {
    if(identical(x[[1L]], as.name("^"))) x[[1L]] <- as.name("pow")
    if(length(x) > 1L) x[2L:length(x)] <- lapply(x[2L:length(x)], f)
  }
  x
}
f(quote(((2-x+3)^2+(x-5+7)^10)^0.5))

## pow((pow((2 - x + 3), 2) + pow((x - 5 + 7), 10)), 0.5)

这应该比正则表达式更强大，因为您依赖于R语言的自然解释而不是可能或可能不全面的文本模式。

详细信息：R中的调用存储在列表类似结构中，其中函数/运算符位于列表的开头，以及以下元素中的参数。例如，考虑：

exp <- quote(x ^ 2)
exp
## x^2
is.call(exp)
## [1] TRUE

我们可以使用as.list来检查调用的基础结构：

str(as.list(exp))
## List of 3
##  $ : symbol ^
##  $ : symbol x
##  $ : num 2

如您所见，第一个元素是函数/运算符，后续元素是函数的参数。

所以，在我们的递归函数中，我们：

检查对象是否为通话
- 如果是：通过^查看通话中的第一个元素，检查是否是对identical(x[[1L]], as.name("^")功能/运营商的呼叫
  - 如果是：用as.name("pow")
  - 然后，不管这是对^的调用还是其他任何内容：
    - 如果调用有其他元素，请循环显示它们并将此函数（即递归）应用于每个元素，将结果替换回原始调用（x[2L:length(x)] <- lapply(x[2L:length(x)], f)）
- 如果否：只返回对象

请注意，调用通常包含函数名称作为第一个元素。您可以使用as.name创建这些名称。名称也被引用为＆＃34;符号＆＃34;在R中（因此是str）的输出。

Answer 2

免责声明：答案是用OP原始正则表达式编写的，当问题听起来是“处理^之前的平衡（嵌套）括号”。请不要将此解决方案用于通用数学表达式解析，仅用于教育目的，并且仅当您确实需要处理平衡括号上下文中的某些文本时。

由于PCRE正则表达式可以匹配嵌套括号，因此可以在while循环中仅使用正则表达式在R中实现检查修改后^的存在带有x <- grepl("(\\(((?:[^()]++|(?1))*)\\))\\^(\\d*\\.?\\d+)", v, perl=TRUE)的字符串。一旦没有^，就没有别的东西可以替代。

正则表达式模式是

(\(((?:[^()]++|(?1))*)\))\^(\d*\.?\d+)

请参阅regex demo

<强>详情：

(\(((?:[^()]++|(?1))*)\)) - 第1组：带有平衡括号的(...)子字符串，将外括号内的内容捕获到第2组（使用((?:[^()]++|(?1))*)子模式）（解释可在{{3简而言之，\匹配外部(，然后(?:[^()]++|(?1))*匹配除(和)以外的1个+字符的零个或多个序列或整个第1组子模式（(?1)是How can I match nested brackets using regex?），然后是)）
\^ - ^插入符号
(\d*\.?\d+) - 第3组：int / float编号（.5，1.5，345）

替换模式包含文字pow()，而\\2和\\3是对第2组和第3组捕获的子字符串的反向引用。

subroutine call：

v <- "((2-x+3)^2+(x-5+7)^10)^0.5" x <- grepl("(\\(((?:[^()]++|(?1))*)\\))\\^(\\d*\\.?\\d+)", v, perl=TRUE) while(x) { v <- sub("(\\(((?:[^()]++|(?1))*)\\))\\^(\\d*\\.?\\d+)", "pow(\\2, \\3)", v, perl=TRUE); x <- grepl("(\\(((?:[^()]++|(?1))*)\\))\\^(\\d*\\.?\\d+)", v, perl=TRUE) } v ## => [1] "pow(pow(2-x+3, 2)+pow(x-5+7, 10), 0.5)"

要支持^(x-3) pow，您可以使用

v <- sub("(\\(((?:[^()]++|(?1))*)\\))\\^(?|()(\\d*\\.?\\d+)|(\\((‌(?:[^()]++|(?3))*)\\‌)))", "pow(\\2, \\4)", v, perl=TRUE);

并检查是否还有其他值要替换：

x <- grepl("(\\(((?:[^()]++|(?1))*)\\))\\^(?|()(\\d*\\.?\\d+)|(\\((‌(?:[^()]++|(?3))*)\\‌)))", v, perl=TRUE)

Answer 3

这是一个解决方案，它以递归方式跟随解析树并替换^：

#parse the expression
#alternatively you could create it with
#expression(((2-x+3)^2+(x-5+7)^10)^0.5)
e <- parse(text = "((2-x+3)^2+(x-5+7)^10)^0.5")

#a recursive function
fun <- function(e) {    
  #check if you are at the end of the tree's branch
  if (is.name(e) || is.atomic(e)) { 
    #replace ^
    if (e == quote(`^`)) return(quote(pow))
    return(e)
  }
  #follow the tree with recursion
  for (i in seq_along(e)) e[[i]] <- fun(e[[i]])
  return(e)    
}

#deparse to get a character string    
deparse(fun(e)[[1]])
#[1] "pow((pow((2 - x + 3), 2) + pow((x - 5 + 7), 10)), 0.5)"

如果rapply使用表达式/调用，这将更容易。

修改

OP已询问有关表现的问题。性能不太可能是此任务的问题，但正则表达式解决方案并不快。

library(microbenchmark) microbenchmark(regex = { v <- "((2-x+3)^2+(x-5+7)^10)^0.5" x <- grepl("(\\(((?:[^()]++|(?1))*)\\))\\^(\\d*\\.?\\d+)", v, perl=TRUE) while(x) { v <- sub("(\\(((?:[^()]++|(?1))*)\\))\\^(\\d*\\.?\\d+)", "pow(\\2, \\3)", v, perl=TRUE); x <- grepl("(\\(((?:[^()]++|(?1))*)\\))\\^(\\d*\\.?\\d+)", v, perl=TRUE) } }, BrodieG = { deparse(f(parse(text = "((2-x+3)^2+(x-5+7)^10)^0.5")[[1]])) }, Roland = { deparse(fun(parse(text = "((2-x+3)^2+(x-5+7)^10)^0.5"))[[1]]) }) #Unit: microseconds # expr min lq mean median uq max neval cld # regex 321.629 323.934 335.6261 335.329 337.634 384.623 100 c # BrodieG 238.405 246.087 255.5927 252.105 257.227 355.943 100 b # Roland 211.518 225.089 231.7061 228.802 235.204 385.904 100 a

我还没有包含@digEmAll提供的解决方案，因为很明显，具有那么多data.frame操作的解决方案会相对较慢。

<强> EDIT2：

这是一个也处理sqrt的版本。

fun <- function(e) { #check if you are at the end of the tree's branch if (is.name(e) || is.atomic(e)) { #replace ^ if (e == quote(`^`)) return(quote(pow)) return(e) } if (e[[1]] == quote(sqrt)) { #replace sqrt e[[1]] <- quote(pow) #add the second argument e[[3]] <- quote(0.5) } #follow the tree with recursion for (i in seq_along(e)) e[[i]] <- fun(e[[i]]) return(e) } e <- parse(text = "sqrt((2-x+3)^2+(x-5+7)^10)") deparse(fun(e)[[1]]) #[1] "pow(pow((2 - x + 3), 2) + pow((x - 5 + 7), 10), 0.5)"

Answer 4

这是一个利用R解析器的示例（使用getParseData函数）：

# helper function which turns getParseData result back to a text expression
recreateExpr <- function(DF,parent=0){
  elements <- DF[DF$parent == parent,]
  s <- ""
  for(i in 1:nrow(elements)){
    element <- elements[i,]
    if(element$terminal)
      s <- paste0(s,element$text)
    else
      s <- paste0(s,recreateExpr(DF,element$id))
  }
  return(s)  
}

expr <- "((2-x+3)^2+(x-5+7)^10)^0.5"

DF <- getParseData(parse(text=expr))[,c('id','parent','token','terminal','text')]

# let's find the parents of all '^' expressions
parentsOfPow <- unique(DF[DF$token == "'^'",'parent'])

# replace all the the 'x^y' expressions with 'pow(x,y)' 
for(p in parentsOfPow){
  idxs <- which(DF$parent == p)
  if(length(idxs) != 3){ stop('expression with '^' is not correct')  }

  idxtok1 <- idxs[1]
  idxtok2 <- idxs[2]
  idxtok3 <- idxs[3]

  # replace '^' token with 'pow'
  DF[idxtok2,c('token','text')] <- c('pow','pow')

  # move 'pow' token as first token in the expression
  tmp <- DF[idxtok1,]
  DF[idxtok1,] <- DF[idxtok2,]
  DF[idxtok2,] <- tmp

  # insert new terminals '(' ')' and ','
  DF <- rbind(
    DF[1:(idxtok2-1),],
    data.frame(id=max(DF$id)+1,parent=p,token=',',terminal=TRUE,text='(',
               stringsAsFactors=FALSE),
    DF[idxtok2,],
    data.frame(id=max(DF$id)+2,parent=p,token=',',terminal=TRUE,text=',',
               stringsAsFactors=FALSE),
    DF[(idxtok2+1):idxtok3,],
    data.frame(id=max(DF$id)+3,parent=p,token=')',terminal=TRUE,text=')',
               stringsAsFactors=FALSE),
    if(idxtok3<nrow(DF)) DF[(idxtok3+1):nrow(DF),] else NULL
  )
}

# print the new expression
recreateExpr(DF)

> [1] "pow((pow((2-x+3),2)+pow((x-5+7),10)),0.5)"

用数学表达式中的C＆＃pow语法替换^（幂）符号

4 个答案: