使用StringTokenizer递归解析字符串

时间:2017-07-01 13:02:26

标签: java parsing recursion tree stringtokenizer

我试图用StringTokenizer递归地解析一个字符串。该字符串表示一个树,格式为:

[(0,1),[(00,01,02),[()],[()]]]

其中节点的信息存储在括号内,而括号是节点的子节点,用逗号分隔。例如,此字符串表示此树:

tree

如果一个节点在括号内有某些东西,那么它就是一个普通的节点,如果它什么都没有,那就是一个叶子。

我已经编写了下面的代码来解析它,并且它工作正常但是当递归结束时,似乎令牌器没有任何其他令牌可以分析。问题是当它遇到最后一个括号(]]])时,它会直接跳到最后一个跳过其他括号。

import java.util.*;

public class ParseString
{

public void setParameters(String parameters) throws Exception {
    setParameters(new StringTokenizer(parameters, "[(,)]", true));

}

public void setParameters(StringTokenizer tokenizer) throws Exception{

    String buf;
    try{
      if (!(buf = tokenizer.nextToken()).equals("["))
        throw new Exception("Malformed string, found " + buf + "instead of [");
      boolean isLeaf = setWeights(tokenizer);
      System.out.println("Leaf: " + isLeaf);
      while (!(buf = tokenizer.nextToken()).equals("]")) {
        do{
           setParameters(tokenizer);
        }while (!(tokenizer.nextToken().equals("]")));
        if (!(buf = tokenizer.nextToken()).equals(","))
           break;
      } 
    }catch(Exception e){e.printStackTrace();}
   }


    public boolean setWeights(StringTokenizer tokenizer) throws 
 Exception{
        String buf;
        if(!(buf = tokenizer.nextToken()).equals("("))
        throw new Exception("Malformed string, found "+ buf + "instead of ("); 
    do{
        buf = tokenizer.nextToken();
        if(buf.equals(")")){
        return true;
    }
    if(!buf.equals(","))
        System.out.println(buf);
    }while(!tokenizer.nextToken().equals(")"));
    return false;
   }


   public static void main(String[] args)
   {
     ParseString ps = new ParseString();    
     try{
        ps.setParameters("[(0,1),[(00,01,02),[()],[()]]]");
     }catch(Exception e){e.printStackTrace();}
   }
 }

这是我运行它的输出:

 0
 1
 Leaf: false
 00
 01
 02
 Leaf: false
 Leaf: true
 Leaf: true
 java.util.NoSuchElementException
    at java.util.StringTokenizer.nextToken(StringTokenizer.java:349)
    at ParseString.setParameters(ParseString.java:22)
    at ParseString.setParameters(ParseString.java:7)
    at ParseString.main(ParseString.java:51)

另一件事:解析器应该能够分析任何通用树,而不仅仅是这个。如果有人能解决这个问题,我会很高兴。

2 个答案:

答案 0 :(得分:1)

我认为在某些情况下,您可能会在嵌套循环中使用]两次,可能会占用父级的右括号。

我只是按照以下方式使结构更加明显:

// Precondition: '[' expected
// Postcondition: Matching ']' consumed
void parseNode(StringTokenizer st) {
  if (!st.nextToken().equals("[")) {
    throw new RuntimeException("[ expected parsing node.");
  }
  boolean leaf = parseWeights(st);
  System.out.println("isleaf: " + leaf);

  // Behind ')': Parse children if any.

  String token = st.nextToken();
  while (token.equals(",")) {
    parseNode(st);
    token = st.nextToken();
  }
  if (!token.equals("]")) {
    throw new RuntimeException("] expected.");
  }
}

// Precondition: '(' expected
// Postcondition: Matching ')' consumed
boolean parseWeights(StringTokenizer st) {
  if (!st.nextToken().equals("(")) {
    throw new RuntimeException("( expected parsing node weights.");
  }
  String token = st.nextToken();
  if (token.equals(")") {
    return true;
  }
  while(true) {
    System.out.println(token);
    token = st.nextToken();
    if (token.equals(")") {
      break;
    }
    if (!token.equals(",") {
      throw new RuntimeException(", or ) expected parsing weights.");
    }
    token = st.nextToken();
  }
  return false;
}   

答案 1 :(得分:0)

您正在调用tokenizer.nextToken()而不检查下一个令牌是否可用(可以通过调用tokenizer.hasMoreTokens()来检查)。您应该首先检查,如果hasMoreTokens()返回false,只需通过调用return;退出该方法。

但IMO最好先将所有令牌放入列表中,然后再以更简单的方式遍历它:

String s = "[(0,1),[(00,01,02),[()],[()]]]";
StringTokenizer strtok = new StringTokenizer(s, "[(,)]", true);
// put tokens in a list
List<String> list = new ArrayList<>();
while (strtok.hasMoreTokens()) {
    list.add(strtok.nextToken());
}
// parse it, starting at position 0
parse(list, 0);

// parse method
public void parse(List<String> list, int position) {
    if (position > list.size() - 1) {
        // no more elements, stop
        return;
    }

    String element = list.get(position);
    if (")".equals(element)) { // end of node
        // is leaf if previous element was the matching "("
        System.out.println("Leaf:" + "(".equals(list.get(position - 1)));
    } else if (!("[".equals(element) || "(".equals(element) || "]".equals(element) || ",".equals(element))) {
        // print only contents of a node (ignoring delimiters)
        System.out.println(element);
    }

    // parse next element
    parse(list, position + 1);
}

输出结果为:

0
1
Leaf:false
00
01
02
Leaf:false
Leaf:true
Leaf:true

如果您想要嵌套/配置输出,可以向level方法添加parse变量:

public void parse(List<String> list, int position, int level) {
    if (position > list.size() - 1) {
        return;
    }
    String element = list.get(position);
    int nextLevel = level;

    if ("[".equals(element)) {
        nextLevel++;
    } else if ("]".equals(element)) {
        nextLevel--;
    } else if (")".equals(element)) {
        for (int i = 0; i < nextLevel; i++) {
            System.out.print("  ");
        }
        System.out.println("Leaf:" + "(".equals(list.get(position - 1)));
    } else if (!("(".equals(element) || "]".equals(element) || ",".equals(element))) {
        for (int i = 0; i < nextLevel; i++) {
            System.out.print("  ");
        }
        System.out.println(element);
    }

    parse(list, position + 1, nextLevel);
}

然后,如果我打电话(使用与上面相同的列表):

// starting at position zero and level zero
parse(list, 0, 0);

输出将是:

  0
  1
  Leaf:false
    00
    01
    02
    Leaf:false
      Leaf:true
      Leaf:true

同一级别中的所有元素都具有相同的标识。