用正则表达式处理λ表达式中的函数应用

时间:2017-05-23 12:15:20

标签: .net regex powershell lambda-calculus

你好!

我将首先给出λ-expressions 的简短定义 λ表达式可以是:

  • 一个变量(这里,假设它是一个小写字母[a-z])(或任何带变量的简单操作(如a*b(a+b)*c))
  • 功能(或抽象)。它具有以下语法:(λx.e)其中 x 是虚拟变量(可以是任何小写字母), e λ表达式(最终包含x s)。它可以被理解为:function λ : x -> e(x)
  • 功能应用程序。它具有以下语法:(f e)(请注意,我想要空格和两个括号) f e 都是λ-expressions 。它可以被理解为:f(e)减少操作基本上意味着评估f(e)

如果您想了解有关Lambda Calculus

的更多信息,请参阅以下链接

现在,我正在尝试找到一个在函数应用程序上执行 reduction 操作的正则表达式。换句话说,在抽象中,我想用下面的表达式替换每个虚拟变量(除了.之前的那个)并给出结果表达式。

以下是一些例子:
(出于输入目的,我们将λ替换为字符串中的\string => result after one reduction
((\x.x) a) => a
((\x.x) (\y.y)) => (\y.y)
(((\x.(\y.x+y)) a) b) => ((\y.a+y) b)
((\y.a+y) b) => a+b
((\x.x) (f g)) => (f g)
((\x.x) ((\y.y) a)) => ((\x.x) a)((\y.y) a)(取决于您认为更容易做的事情。我的猜测将是第一个)


它可以通过多次替换完成,但我希望不超过2次 我使用的语言是Powershell,因此正则表达式必须支持.NET风格(它确实意味着不允许递归...)
我很确定平衡组有什么关系,但我找不到正常工作的正则表达式...
此外,肯定有比使用正则表达式更好的解决方案,但我希望使用正则表达式执行此操作,此处无代码。
当我想到好的时候,我会添加更多的例子。

编辑1:

到目前为止我设法做的就是匹配表达式并使用以下正则表达式捕获每个子表达式:

(?:[^()]|(?'o'\()|(?'c-o'\)))*(?(o)(?!))

演示here

编辑2:

我在这里取得了一些进展,这个正则表达式:

(?>\((?'c')\\(\w)\.|[^()]+|\)(?'-c'))+(?(c)(?!))(?=\s((?>\((?'c')|[^()]+|\)(?'-c'))*(?(c)(?!))))

演示here
现在我需要做的是只匹配第二个y而不是当前匹配。

编辑3:

我觉得没有人能够在这里帮助我......也许我会问一些太难的事情:(
但是,我几乎拥有我需要的东西。以下是我提出的建议:

(?<=\(\\(\w)\.(?>\((?'c')|\)(?'-c')|(?>(?!\(|\)|\1).)*)*)\1(?=(?>\((?'c')|\)(?'-c')|(?>(?!\(|\)|\1).)*)*(?(c)(?!))\)\s((?>\((?'c')|[^()]+|\)(?'-c'))*(?(c)(?!))))

演示here
正如您所看到的,我只能在一次出现的位置匹配要替换的变量。当它出现多次时,只有最后一个匹配(看起来很明显,看到正则表达式。我不明白为什么它是最后一个而不是第一个匹配但是......)

编辑4:

好的我差不多完成了!我只是第三行有问题,正则表达式没有正确匹配,我无法理解为什么。一旦我想出这个不匹配的字符串,我就会发布这个问题的答案 这是正则表达式(虽然它现在不可读,我稍后会发表评论版)

(?:(?<=\(\\(\w)\.(?>\((?'c')|\)(?'-c')|[^()\n])*)\1(?=(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)\s((?>\((?'c')|[^()]+|\)(?'-c'))*(?(c)(?!)))))|(?:\(\(\\\w\.(?=(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)\s(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)))|(?:(?<=\(\(\\\w\.(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!)))\)\s(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\))

演示here

结局编辑:没关系我发现了问题,这只是一个不好看的后视,只是阅读下面的答案

1 个答案:

答案 0 :(得分:1)

好的我明白了。这是一个相当长的正则表达式,因此请自行承担风险;) 这就是:

(?x)  # Turns on free-spacing mode
      # The regex is an alternation of 3 groups
      # Each group corresponds to one part of the string
      # One for the function definition to replace the parameter, by the argument
      # in ((\x.(x+b)*c) a), it's (x+b)*c, with x matched and replaced by a
      # One for the beginning of the function definition (to replace it by nothing)
      # in ((\x.(x+b)*c) a), it's ((\x.
      # And the third one for the closing parenthesis and the argument
      # in ((\x.(x+b)*c) a), it's ) a)
(?:                # 1st non capturing group
  (?<=             # Positive lookbehind
    \(\\(\w)\.     # Look for the sequence '(\x.' where x is captured in group 1
    (?>            # Atomic group
                   # (No need to make it atomic here, it was just for reading purpose)
                   # Here come the balancing groups. You can see them as counters
      \((?(c)(?'-c')|(?'o')) |  # Look for a '(' then decrease 'c' counter or increase 'o' if 'c' is already 0
      \)(?(o)(?'-o')|(?'c')) |  # Look for a ')' then decrease 'o' counter or increase 'c' if 'o' is already 0
      [^()\n]      # Look for a character that is not a new line nor a parenthesis
                   # Note that preventing \n is just for text with multiple λ-expressions, one per line
    )*             # Repeat
  )  # End of lookbehind
  \1             # Match the parameter
                 # Note that if it is a constant function, it will not be matched.
                 # However the reduction will still be done thanks to other groups
  (?=            # Positive lookahead
    (?>          # Atomic group. It's the same as the previous one
    \((?(c)(?'-c')|(?'o')) |  # All atomic groups here actually mean 'look for a legal λ-expression'
    \)(?(o)(?'-o')|(?'c')) |
    [^()\n]
  )*
  # this is where balancing groups really come into play
  # We are now going to check if number of '(' equals number of ')'
  (?(o)(?!))     # Fail if 'o' is not 0 (meaning there are more '(' than ')'
  (?(c)(?!))     # Fail if 'c' is not 0 (meaning there are more ')' than '('
  \)\s           # Look for a ')' and a space
    (            # Capturing group 2. Here come the argument
      (?>\((?'c')|\)(?'-c')|[^()\n])+(?(c)(?!))  # Again, look for a legal λ-expression
    )            # End of capturing group
  \) # Look for a ')'
  )  # End of lookahead
) |  # End of 1st non-capturing group
(?:  # 2nd non-capturing group
  \(\(\\\w\.     # Match '((\x.'
  (?=            # Positive lookahead
    (?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))  # Look for a legal λ-expression
    \)\s         # Followed by ')' and a space
    (?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))  # Followed by a legal λ-expression
    \)           # Followed by a ')'
  )  # End of lookahead
) |  # End of 2nd non-capturing group
(?:  # 3rd non-capturing group
  (?<=           # Positive lookbehind
    \(\(\\\w\.   # Look for '((\x.'
    (?>\((?'-c')|\)(?'c')|[^()\n])*
           # Here is what caused issues for my 4th edit.
           # I am not sure why, but the engine seems to read it from right to left
           # So I had, like before :
           # (?'c') for '(' (increment)
           # (?'-c') for ')' (decrement)
           # But from right to left, we encounter ')' first, so "decrement" first
           # By "decrement", I mean pop the stack, which is still empty
           # So parenthesis were not balanced anymore
           # That is why (?'c') and (?'-c') were swapped here
    (?(c)(?!))   # Check parenthesis count
  )  # End of lookbehind
  \)\s           # Match a ')' and a space
  (?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))  # Then a legal λ-expression
  \)             # And finally the last ')' of the function application
)  # End of 3rd non-capturing group

所以这是紧凑的正则表达式:

(?:(?<=\(\\(\w)\.(?>\((?(c)(?'-c')|(?'o'))|\)(?(o)(?'-o')|(?'c'))|[^()\n])*)\1(?=(?>\((?(c)(?'-c')|(?'o'))|\)(?(o)(?'-o')|(?'c'))|[^()\n])*(?(o)(?!))(?(c)(?!))\)\s((?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!)))\)))|(?:(?<!\(\(\\\w\..*)\(\(\\\w\.(?=(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)\s(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)))|(?:(?<=\(\(\\\w\.(?>\((?'-c')|\)(?'c')|[^()\n])*(?(c)(?!)))\)\s(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)(?!.*\)\s(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)))

紧凑的正则表达式与详细的正则表达式不完全相同。我刚添加了两个负面的环视,以确保每行只进行一次减少。多次缩减可能是大型表达式中的问题,因为在某些情况下它们可能会重叠...

您需要用$2替换匹配项。第二个捕获组仅在第一个交替的情况下设置,因此它将为空或函数应用的参数
演示here

还有很多事情需要改进或纠正,所以我可能会在我正在处理它时更新它。

编辑:

好的我发现了问题。这应该是最后一次编辑 我不认为我可以使用单个计数器来处理函数定义(正如我在代码注释中所称的那样),因为堆栈的大小不能为负,所以计数器不能否定。我必须使用2个堆栈,一个用于(,一个用于),然后测试它们是否具有相同的大小。如果您想了解更多信息,请查看代码。

小心:这个正则表达式适用于大多数λ表达式,但不测试变量是否空闲。我没有发现这个正则表达式没有处理任何λ表达式,尽管它并不意味着没有。我不会尝试为每个λ表达式证明这个正则表达式的工作;)