java复杂的逻辑条件解析器

时间:2015-09-22 21:00:11

标签: java algorithm parsing logical-operators recursive-descent

我有一组传入记录,需要在定义和存储的一组逻辑子句下进行评估。一个示例逻辑子句如下:

Acct1 != 'Y' AND Acct2 > 1004 AND Acct3 >= 96 AND Acct4 < 1004 AND Acct5 = 99 AND ((Acct6 <= 9090 OR Acct7 IN (A1,A2,A6) AND Acct1 NOT IN (A3,A4)) AND Formatted LIKE 'LINUX' AND Acct9 NOT LIKE 'WINDOWS' AND (Acct10 = 'N' AND NOT Acct11 = 'N') AND EditableField BETWEEN (10 AND 20) )

我对该条款的数据输入如下:

map.put(Acct1,"Y")
map.put(Acct2,1010)
map.put(Acct3,99)
map.put(Acct4,1015)
map.put(Acct5,99)
map.put(Acct6,9090)
map.put(Acct7,"A3")
map.put(Formatted,"LINUX_INST")
map.put(Updated,"LINUX_TMP")
map.put(Acct10,"Y")
map.put(Acct11,"N")
map.put(EditableFIeld,25)

我必须将填充到地图中的传入记录评估到上面定义的子句中,并根据评估结果打印true或false。

子句条件和映射值也将被更改和执行。

我要评估以下条件条款:

!=
>
>=
<
=
<=
IN(
NOT IN(
LIKE(
NOT LIKE(
BETWEEN(
AND
OR
AND NOT
OR NOT

我尝试过使用语法生成器,但我被告知它不是我们应用程序推荐的解决方案,因此我正在寻找java代码,我有这个详细的例子来参考AND,OR,=。 resolving logical operations - AND, OR, looping conditions dynamically并在可能的情况下寻找最基础的片段。

2 个答案:

答案 0 :(得分:4)

如果要避免使用解析器生成器,请考虑使用StreamTokenizer实现递归下降解析器,每种语法规则使用一种方法。

对于你的语法的一个子集,这应该看起来大致相同(并且应该直接扩展到你的完整语法):

public class Parser {

  public static Node parse(String expr) {
    StreamTokenizer tokenizer = 
        new StreamTokenizer(new StringReader(expr));
    tokenizer.nextToken();
    Parser parser = new Parser(tokenizer);
    Node result = parser.parseExpression();
    if (tokenizer.ttype != StreamTokenizer.TT_EOF) {
      throw new RuntimeException("EOF expected, got " 
          + tokenizer.ttype + "/" + tokenizer.sval);
  }

  private StreamTokenizer tokenizer;

  private Parser(StreamTokenizer tokenizer) {
    this.tokenizer = tokenizer;
  } 

  private Node parseExpression() {
    Node left = parseAnd();
    if (tokenizer.ttype == StreamTokenizer.TT_WORD
        && tokenizer.sval.equals("OR")) {
      tokenizer.nextToken();
      return new OperationNode(OperationNode.Type.OR, 
          left, parseExpression());
    }
    return left;
  }

  private Node parseAnd() {
    Node left = parseRelational();
    if (tokenizer.ttype == StreamTokenizer.TT_WORD
        && tokenizer.sval.equals("AND")) {
      tokenizer.nextToken();
      return new OperationNode(OperationNode.Type.AND, 
          left, parseAnd());
    }
    return left;
  }

  private Node parseRelational() {
    Node left = parsePrimary();
    OperationNode.Type type;
    switch (tokenizer.ttype) {
      case '<': type = OperationNode.Type.LESS; break;
      case '=': type = OperationNode.Type.EQUAL; break;
      case '>': type = OperationNode.Type.GREATER; break;
      default:  
        return left;
    }
    tokenizer.nextToken();
    return new OperationNode(type, left, parseRelational());
  }

  private Node parsePrimary() {
    Node result;
    if (tokenizer.ttype == '(') {
      tokenizer.nextToken();
      result = parseExpression();
      if (tokenizer.ttype != ')') {
        throw new RuntimeException(") expected, got "
          + tokenizer.ttype + "/" + tokenizer.sval);
       }
    } else if (tokenizer.ttype == '"' || tokenizer.ttype == '\'') {
      result = new LiteralNode(tokenizer.sval);
    } else if (tokenizer.ttype == TT_NUMBER) {
      result = new LiteralNode(tokenizer.nval);
    } else if (tokenizer.ttype == StreamTokenizer.TT_WORD) {
      result = new FieldNode(tokenizer.sval);
    } else {
      throw new RuntimeException("Unrecognized token: " 
          + tokenizer.ttype + "/" + tokenizer.sval);
    }
    tokenizer.nextToken();
    return result;
  }
}

这假设一个Node对象层次结构如下:

interface Node {
   Object eval(Map<String,Object> data);
}

class FieldNode implements Node {
   private String name; 
   FieldNode(String name) {
     this.name = name;
   }
   public Object eval(Map<String,Object> data) {
     return data.get(name);
   }
}

class LiteralNode implements Node {
   private Object value; 
   FieldNode(Object value) {
     this.value = value;
   }
   public Object eval(Map<String,Object> data) {
     return value;
   }
}

class OperationNode implements Node {
  enum Type {
    AND, OR, LESS, GREATER, EQUALS
  }
  private Type type;
  private Node leftChild;
  private Node rightChild;

  OperationNode(Type type, Node leftChild, Node rightChild) {
    this.type = type;
    this.leftChild = leftChild;
    this.rightChild = rightChild;
  }

  public Object eval(Map<String,Object> data) {
    Object left = leftChild.eval(data);
    Object right = rightChild.eval(data);
    switch (type) {
      case AND: return ((Boolean) left) && ((Boolean) right);
      case OR: return ((Boolean) left) || ((Boolean) right);
      case LESS: return ((Comparable) left).compareTo(right) < 0;
      case EQUALS: return left.equals(right);
      case GREATE: return ((Comparable) left).compareTo(right) > 0;
      default:
        throw new RuntimeException("Invalid op: " + type);
    }
  }    

答案 1 :(得分:2)

要直接回答这个问题,一些SO问题(例如12)描述了手工编写解析器的基础知识,尽管在实践中手动编写解析器是非常不寻常的由于涉及样板和严格的细节,大学以外的编译课程。

正如评论中所讨论的,听起来避免语法生成器的主要原因是避免依赖外部库。但是,当使用像JavaCC(Java Compiler-Compiler)这样的语法生成器(解析器生成器)时,没有涉及JAR文件或外部依赖项:JavaCC二进制文件将语法规范转换为Java代码,可以在没有涉及任何进一步的图书馆。

以IBM教程JoAnn Brereton的"Use JavaCC to build a user friendly boolean query language"为例,它恰好涉及与您不同的搜索语言的语法。

示例输入:

actor = "Christopher Reeve" and keyword=action and keyword=adventure
(actor = "Christopher Reeve" and keyword=action) or keyword=romance
actor = "Christopher Reeve" and (keyword=action or keyword=romance)

语法摘录:

TOKEN : 
{
<STRING : (["A"-"Z", "0"-"9"])+ >
<QUOTED_STRING: "\"" (~["\""])+ "\"" >
}

void queryTerm() :
{
}
{
        (<TITLE> | <ACTOR> |
         <DIRECTOR> | <KEYWORD>)
        ( <EQUALS> | <NOTEQUAL>)
        ( <STRING> | <QUOTED_STRING> )
        |
       <LPAREN> expression() <RPAREN>
}

输出文件:

  • UQLParser.java
  • UQLParserConstants.java
  • UQLParserTokenManager.java
  • TokenMgrError.java
  • ParseException.java
  • Token.java
  • SimpleCharStream.java

这是您可以考虑的几种解析器生成器之一;其他人,如yacc and bison,也可以生成独立的Java文件而无需外部库。如有必要,您可以直接将生成的Java文件检入存储库,只有在需要调整语法时才会保留.jj编译器源文件。 (尽管在构建过程中从源代码中新编译并避免将生成的文件检查到源代码控制中可能会更好,但这可能更适合您对仅Java解决方案的约束。)