考虑以下多行字符串:
This is multiline text that needs to be correctly parsed into key-value pairs, excluding all other information.
Section One:
First key = Value One
Second key = Value Two
Section Two:
Third key = Value Three
Fourth key = Value Four
Fifth key = Value Five
Section Three:
Sixth key = Value Six
Seventh key = Value Seven
Eighth key = Value Eight
换句话说,文本包含一个“介绍”(一些短语),后跟多行,按部分组织,每个部分都有一个“标题”(例如Section One
)和多个键 - 值对,用=
分隔。
键可以包含除新行和=
之外的任何字符,并且值可以包含除新行之外的任何字符。
有时,文本中可能会出现其他不相关的行。
需要一个正则表达式,它将导致matched.find()
返回所有键值对组,并且只返回那些,跳过引言和节标题,以及没有键值的任何其他行对。
理想情况下,不需要其他文本预处理或后处理。
在此用例中,不能逐行阅读文本并进行相应处理。
像(?:\r|\n)(\s*[^=\.]+)\s*=\s*(.+)
这样的模式接近但它们仍然包含更多的要求。
有什么想法吗?
答案 0 :(得分:2)
你快到了。只需将\s*
更改为<space>*
,因为\s
也会匹配换行符。
(?:\r|\n) *([^\n=\.]+)(?<=\S) *= *(.+)
如果它包含标签,请将上面的space*
更改为[ \t]*
。 (?<=\S)
肯定的外观,它断言匹配必须以非空格字符开头。
String s = "This is multiline text that needs to be correctly parsed into key-value pairs, excluding all other information.\n" +
"\n" +
" Section One:\n" +
" First key = Value One\n" +
" Second key = Value Two\n" +
"\n" +
" Section Two: \n" +
" Third key = Value Three\n" +
" Fourth key = Value Four\n" +
" Fifth key = Value Five\n" +
"\n" +
" Section Three:\n" +
" Sixth key = Value Six\n" +
" Seventh key = Value Seven\n" +
" Eighth key = Value Eight";
Matcher m = Pattern.compile("(?:\\r|\\n)[\\t ]*([^\\n=\\.]+)(?<=\\S)[\\t ]*=[\\t ]*(.+)").matcher(s);
while(m.find())
{
System.out.println("Key : "+m.group(1) + " => Value : " + m.group(2));
}
<强>输出:强>
Key : First key => Value : Value One
Key : Second key => Value : Value Two
Key : Third key => Value : Value Three
Key : Fourth key => Value : Value Four
Key : Fifth key => Value : Value Five
Key : Sixth key => Value : Value Six
Key : Seventh key => Value : Value Seven
Key : Eighth key => Value : Value Eight