我如何通过Java解析格式

时间:2014-01-01 21:13:08

标签: java parsing

我可以调用任何API来解析以下格式文件吗?

define student {
    full_name     Smith,John
    sex    male
    age      19
    grade      90
    class_number   8.43.1
    reg_hour  5x3
}

文件格式不正确。如图所示,标题和值之间有不同数量的空格和\ t。

任何通过java解析这种格式的建议?或者通过Python ......

2 个答案:

答案 0 :(得分:1)

使用StreamTokenizer在Java中应该是直截了当的:

http://docs.oracle.com/javase/6/docs/api/java/io/StreamTokenizer.html

它将跳过所有类型的空格,但您需要调用eolIsSignificant(true),因为这些值似乎没有任何其他分隔符。

应该大致相似(不确定是否可以在解析过程中打开和关闭eol意义):

StreamTokenizer tokenizer = new StreamTokenizer(fileInputStream);
tokenizer.wordChars('_', '_');
tokenizer.nextToken(); 
while ("define".equals(tokenizer.sval)) {
  tokenizer.nextToken();
  String recordName = tokenizer.sval;
  if (tokenizer.nextToken() != '{') {
    throw new RuntimeException("'{' expected");
  }
  while (tokenizer.nextToken() != '}') {
    String key = tokenizer.sval;
    tokenizer.nextToken();
    String value = tokenizer.sval;
    tokenizer.eolIsSignificant(true);
    while (tokenizer.nextToken() != StreamTokenizer.TT_EOL &&
           tokenizer.nextToken() != StreamTokenizer.TT_EOF) {
      value += " " + tokenizer.sval();  // If this is common, use StringBuilder
    }
    tokenizer.eolIsSignificant(false);
  }
}

答案 1 :(得分:0)

There are various ways of parsing text. You can use ant which best fits to u

String.split methods
StringTokenizer and  StreamTokenizer classes
Scanner class
Pattern and Matcher classes, which implement regular expressions
for the most complex parsing tasks, you can use tools such as JavaCC