我可以调用任何API来解析以下格式文件吗?
define student {
full_name Smith,John
sex male
age 19
grade 90
class_number 8.43.1
reg_hour 5x3
}
文件格式不正确。如图所示,标题和值之间有不同数量的空格和\ t。
任何通过java解析这种格式的建议?或者通过Python ......
答案 0 :(得分:1)
使用StreamTokenizer在Java中应该是直截了当的:
http://docs.oracle.com/javase/6/docs/api/java/io/StreamTokenizer.html
它将跳过所有类型的空格,但您需要调用eolIsSignificant(true)
,因为这些值似乎没有任何其他分隔符。
应该大致相似(不确定是否可以在解析过程中打开和关闭eol意义):
StreamTokenizer tokenizer = new StreamTokenizer(fileInputStream);
tokenizer.wordChars('_', '_');
tokenizer.nextToken();
while ("define".equals(tokenizer.sval)) {
tokenizer.nextToken();
String recordName = tokenizer.sval;
if (tokenizer.nextToken() != '{') {
throw new RuntimeException("'{' expected");
}
while (tokenizer.nextToken() != '}') {
String key = tokenizer.sval;
tokenizer.nextToken();
String value = tokenizer.sval;
tokenizer.eolIsSignificant(true);
while (tokenizer.nextToken() != StreamTokenizer.TT_EOL &&
tokenizer.nextToken() != StreamTokenizer.TT_EOF) {
value += " " + tokenizer.sval(); // If this is common, use StringBuilder
}
tokenizer.eolIsSignificant(false);
}
}
答案 1 :(得分:0)
There are various ways of parsing text. You can use ant which best fits to u
String.split methods
StringTokenizer and StreamTokenizer classes
Scanner class
Pattern and Matcher classes, which implement regular expressions
for the most complex parsing tasks, you can use tools such as JavaCC