我想从起始标题到结束标题获取整个块,但不包括结束标题。例如:
<section1>
Base_Currency=EUR
Description=Revaluation
Grouping_File
<section2>
匹配结果应为:
<section1>
Base_Currency=EUR
Description=Revaluation
Grouping_File
问题是如何在java中使用Regex为此匹配制定模式?
答案 0 :(得分:2)
如果您的整个输入采用此格式,则可以简单地拆分:
String[] sections = input.split("\\R(?=<)");
\R
是&#34;任何换行序列&#34; (?=<)
表示&#34;下一个字符是'<'
&#34;。
但是,如果情况并非如此,那么您需要使用正则表达式工具箱:
DOTALL
标志,因此点也匹配换行符MULTILINE
标志,因此^
也匹配行首假设&#34;部分&#34;从&#34;&lt;&#34;开始在一行的开头:
"(?sm)^<\\w+>(.(?!^<))*"
以下是如何使用它的:
String input = "<section1>\nBase_Currency=EUR\nDescription=Revaluation\nGrouping_File\n<section2>\nfoo";
Matcher matcher = Pattern.compile("(?sm)^<\\w+>(.(?!^<))*").matcher(input);
while (matcher.find()) {
String section = matcher.group();
}
答案 1 :(得分:1)
如果您输入的内容如下
<section1>
Base_Currency=EUR
Description=Revaluation
Grouping_File
<section2>
Base_Currency=EUR
Description=Revaluation
Grouping_File
<section3>
Base_Currency=EUR
Description=Revaluation
Grouping_File
然后您可以使用以下正则表达式
(?s)(<section\d+>.*?)(?=<section\d+>|$)
正则表达式的解释是
NODE EXPLANATION
--------------------------------------------------------------------------------
(?s) set flags for this block (with . matching
\n) (case-sensitive) (with ^ and $
matching normally) (matching whitespace
and # normally)
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
<section '<section'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
> '>'
--------------------------------------------------------------------------------
.*? any character (0 or more times (matching
the least amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
<section '<section'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
> '>'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
) end of look-ahead
如果您只想匹配一个标签,则可以使用
(?s)(<section\d+>[^<]*)
此正则表达式的说明是
NODE EXPLANATION
--------------------------------------------------------------------------------
(?s) set flags for this block (with . matching
\n) (case-sensitive) (with ^ and $
matching normally) (matching whitespace
and # normally)
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
<section '<section'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
> '>'
--------------------------------------------------------------------------------
[^<]* any character except: '<' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1