我有一个字符串:
Fields { name:"aa" type: "bb" paramA { name:"cc" } paramB { other:"ee" other_p:"ff"} paramC { name: "bb" param: "dd" other_params { abc: "xx" xyz:"yy"}} }
我在Java中的正则表达式代码提取了paramA,paramB和other_params的括号之间的所有内容。我需要以某种方式在Java对象中构造它,但我仍然坚持paramC提取。
Pattern pattern=Pattern.compile("\\w+\\s(\\{([^{]*?)\\})");
Matcher matcher=pattern.matcher(theAboveString);
while (matcher.find()){
System.out.println(matcher.group(1);
}
我的提取代码
答案 0 :(得分:0)
以下是使用正则表达式进行解析的示例:
String input = "Fields { name:\"aa\" type: \"bb\" paramA { name:\"cc\" } paramB { other:\"ee\" other_p:\"ff\"} paramC { name: \"bb\" param: \"dd\" other_params { abc: \"xx\" xyz:\"yy\"}} }";
Matcher m = Pattern.compile("\\s*(?:(\\w+)\\s*(?::\\s*(\".*?\")|\\{)|\\})\\s*").matcher(input);
int start = 0;
Deque<String> stack = new ArrayDeque<>();
while (m.find()) {
if (m.start() != start)
throw new IllegalArgumentException("Invalid data at " + start);
if (m.group(2) != null) {
System.out.println(stack + " : " + m.group(1) + " = " + m.group(2));
} else if (m.group(1) != null) {
//System.out.println(m.group(1) + " {");
stack.addLast(m.group(1));
} else {
//System.out.println("}");
if (stack.isEmpty())
throw new IllegalArgumentException("Unbalanced brace at " + start);
stack.removeLast();
}
start = m.end();
}
if (start != input.length())
throw new IllegalArgumentException("Invalid data at " + start);
if (! stack.isEmpty())
throw new IllegalArgumentException("Unexpected end of text");
输出
[Fields] : name = "aa"
[Fields] : type = "bb"
[Fields, paramA] : name = "cc"
[Fields, paramB] : other = "ee"
[Fields, paramB] : other_p = "ff"
[Fields, paramC] : name = "bb"
[Fields, paramC] : param = "dd"
[Fields, paramC, other_params] : abc = "xx"
[Fields, paramC, other_params] : xyz = "yy"
你应该可以从这里拿走它。
<强>更新强>
要同时支持数值,请使用此正则表达式:
"\\s*(?:(\\w+)\\s*(?::\\s*(\".*?\"|[-+0-9.eE]+)|\\{)|\\})\\s*"
使用"Layer { name: \"conv2\" type: \"Convolution\" bottom: \"norm1\" top: \"conv2\" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 256 pad: 2 kernel_size: 5 group: 2 weight_filler { type: \"gaussian\" std: 0.01 } bias_filler { type: \"constant\" value: 1 } }}"
进行测试会产生:
[Layer] : name = "conv2"
[Layer] : type = "Convolution"
[Layer] : bottom = "norm1"
[Layer] : top = "conv2"
[Layer, param] : lr_mult = 1
[Layer, param] : decay_mult = 1
[Layer, param] : lr_mult = 2
[Layer, param] : decay_mult = 0
[Layer, convolution_param] : num_output = 256
[Layer, convolution_param] : pad = 2
[Layer, convolution_param] : kernel_size = 5
[Layer, convolution_param] : group = 2
[Layer, convolution_param, weight_filler] : type = "gaussian"
[Layer, convolution_param, weight_filler] : std = 0.01
[Layer, convolution_param, bias_filler] : type = "constant"
[Layer, convolution_param, bias_filler] : value = 1
答案 1 :(得分:0)
您无法使用正则表达式解析无限可嵌套节点。 (请参阅Chomsky对语言/自动机的分类,或任何有关使用正则表达式解析HTML的stackoverflow问题。)
我已经创建了一个库,让你解析这样的东西。它甚至还有适当的文档。
http://sourceforge.net/projects/jparser2/
文档: