我正在学习正则表达式,正在努力寻找满足以下条件的正则表达式格式:
示例:
<NoteText><![CDATA[dvsdhjkndlv <<<RED>>> <72901> </NoteText>
这应该返回 RED 之前的 3 个“<”和 72901 之前的 1 个“<”
最初我尝试使用下面的负前瞻正则表达式模式。
<(?!!)
但它也会返回“NoteText”短语之前的“<”。
我不确定如何限制“
(?:<NoteText>.*)(<(?!!)).*(?:<\/NoteText>)
答案 0 :(得分:0)
PCRE,不漂亮,但有效:
(?:\G(?!\A)|<NoteText>)(?:(?!<\/?NoteText>).)*?\K<(?!!)(?=(?:(?!<\/?NoteText>).)*?<\/NoteText>)
说明
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
\G where the last m//g left off
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\A the beginning of the string
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
<NoteText> '<NoteText>'
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the least amount possible)):
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
\/? '/' (optional (matching the most
amount possible))
--------------------------------------------------------------------------------
NoteText> 'NoteText>'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
)*? end of grouping
--------------------------------------------------------------------------------
\K match reset operator
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
! '!'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the least amount
possible)):
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
\/? '/' (optional (matching the most
amount possible))
--------------------------------------------------------------------------------
NoteText> 'NoteText>'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
)*? end of grouping
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
NoteText> 'NoteText>'
--------------------------------------------------------------------------------
) end of look-ahead
答案 1 :(得分:0)
这是 Java 8 中的一种工作方法。请记住,只有当您没有 嵌套 <NoteText>
标记时,这才有效。
String myString = "<NoteText><![CDATA[dvsdhjkndlv <<<RED>>> <72901> </NoteText>";
Matcher outerMatcher = Pattern.compile("(?<=<NoteText>).*?(?=</NoteText>)").matcher(myString);
while (outerMatcher.find()) {
String content = outerMatcher.group(); // this is the content of the current NodeText tag
Matcher innerMatcher = Pattern.compile("<(?!!)").matcher(content);
int count = 0;
while (innerMatcher.find()) count++;
System.out.println(count); // this will print 4
}
上面的代码被认为也适用于多次出现的 <NoteText>
标签的字符串。
如果您知道自己只有一个 <NoteText>
标签,只需将 while
替换为 if
。