用于布尔逻辑的正则表达式

时间:2017-04-17 12:10:08

标签: ruby regex

我正在尝试使用正则表达式来验证字符串。它应该允许字符串和booleaen运算符之间的空格如(@string1 OR),但不允许在(string 1)之类的字符串之间使用空格。允许的其他布尔逻辑是:

(A AND B) AND (NOT C)
(A OR B) AND (NOT C)
(A AND B)
(A OR B)
(NOT C)

下面列出了可能有效和无效输入的示例。

有效:

(@string1 OR @string2) AND ( NOT @string3)
(@string-1 AND @string.2) AND ( NOT @string_3)
(@string1 OR @string2 OR @string4) AND ( NOT @string3 AND NOT @string5)
(@string1    OR   @string2   OR    @string4)
(@string1 AND @string2 AND @string4)
( NOT @string1 AND NOT @string2 AND NOT @string4)
( NOT @string1 AND NOT @string2)

无效:

()
(string  1 OR @str ing2) AND ( NOT @tag3)
(@string 1 OR @tag 2) AND ( NOT @string 3)
(@string1  @string2) ( NOT @string3)
(@string1 OR @string12) AND (@string3)
(@string1 AND NOT @string2)

解析字符串然后让多个正则表达式检查没有空格是否更好,还是可以编写正则表达式来检查整个字符串?

2 个答案:

答案 0 :(得分:1)

使用语法分析器可以最好地解决这种复杂的验证。

为了让您入门,这是一个(不完整的)解析器。如您所见,您从原语构建并构建越来越复杂的结构。

require 'parslet'

class Boolean < Parslet::Parser
  rule(:space)  { match[" "].repeat(1) }
  rule(:space?) { space.maybe }

  rule(:lparen) { str("(") >> space? }
  rule(:rparen) { str(")") >> space? }

  rule(:and_operator) { str("AND") >> space? }
  rule(:or_operator) { str("OR") >> space? }
  rule(:not_operator) { str("NOT") >> space? }

  rule(:token) { str("@") >> match["a-z0-9"].repeat >> space? }

  # The primary rule deals with parentheses.
  rule(:primary) { lparen >> expression >> rparen | token }

  rule(:and_expression) { primary >> and_operator >> primary }
  rule(:or_expression) { primary >> or_operator >> primary }
  rule(:not_expression) { not_operator >> primary }

  rule(:expression) { or_expression | and_expression | not_expression | primary }

  root(:expression)
end

您可以使用这个小辅助方法测试字符串:

def parse(str)
  exp = Boolean.new
  exp.parse(str)
  puts "Valid!"
rescue Parslet::ParseFailed => failure
  puts failure.parse_failure_cause.ascii_tree
end

parse("@string AND (@string2 OR @string3)")
#=> Valid!
parse("(string1 AND @string2)")
#=> Expected one of [OR_EXPRESSION, AND_EXPRESSION, NOT_EXPRESSION, PRIMARY] at line 1 char 1.
#   ...
#   - Failed to match sequence ('@' [a-z0-9]{0, } SPACE?) at line 1 char 2.
#      - Expected "@", but got "s" at line 1 char 2.

答案 1 :(得分:0)

你需要递归或循环,一个正确解析它的堆栈和单独的正则表达将是非常困难的,虽然不可能验证。