Question

我正在学习正则表达式，正在努力寻找满足以下条件的正则表达式格式：

检查“”和“”之间的内容
如果有一个或多个“<”符号后面没有“！”，则返回所有标识的“<”符号。

示例：

<NoteText><![CDATA[dvsdhjkndlv        <<<RED>>>  <72901> </NoteText>

这应该返回 RED 之前的 3 个“<”和 72901 之前的 1 个“<”

最初我尝试使用下面的负前瞻正则表达式模式。

<(?!!)

但它也会返回“NoteText”短语之前的“<”。

我不确定如何限制“”和“”之间的过滤区域。尝试以下方法也不起作用。

(?:<NoteText>.*)(<(?!!)).*(?:<\/NoteText>)

Answer 1

PCRE，不漂亮，但有效：

(?:\G(?!\A)|<NoteText>)(?:(?!<\/?NoteText>).)*?\K<(?!!)(?=(?:(?!<\/?NoteText>).)*?<\/NoteText>)

见regex proof。

说明

--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    \G                       where the last m//g left off
--------------------------------------------------------------------------------
    (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
      \A                       the beginning of the string
--------------------------------------------------------------------------------
    )                        end of look-ahead
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    <NoteText>               '<NoteText>'
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the least amount possible)):
--------------------------------------------------------------------------------
    (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
      <                        '<'
--------------------------------------------------------------------------------
      \/?                      '/' (optional (matching the most
                               amount possible))
--------------------------------------------------------------------------------
      NoteText>                'NoteText>'
--------------------------------------------------------------------------------
    )                        end of look-ahead
--------------------------------------------------------------------------------
    .                        any character except \n
--------------------------------------------------------------------------------
  )*?                      end of grouping
--------------------------------------------------------------------------------
  \K                       match reset operator
--------------------------------------------------------------------------------
  <                        '<'
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    !                        '!'
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    (?:                      group, but do not capture (0 or more
                             times (matching the least amount
                             possible)):
--------------------------------------------------------------------------------
      (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
        <                        '<'
--------------------------------------------------------------------------------
        \/?                      '/' (optional (matching the most
                                 amount possible))
--------------------------------------------------------------------------------
        NoteText>                'NoteText>'
--------------------------------------------------------------------------------
      )                        end of look-ahead
--------------------------------------------------------------------------------
      .                        any character except \n
--------------------------------------------------------------------------------
    )*?                      end of grouping
--------------------------------------------------------------------------------
    <                        '<'
--------------------------------------------------------------------------------
    \/                       '/'
--------------------------------------------------------------------------------
    NoteText>                'NoteText>'
--------------------------------------------------------------------------------
  )                        end of look-ahead

Answer 2

这是 Java 8 中的一种工作方法。请记住，只有当您没有嵌套 <NoteText> 标记时，这才有效。

String myString = "<NoteText><![CDATA[dvsdhjkndlv        <<<RED>>>  <72901> </NoteText>";
Matcher outerMatcher = Pattern.compile("(?<=<NoteText>).*?(?=</NoteText>)").matcher(myString);
while (outerMatcher.find()) {
    String content = outerMatcher.group();  // this is the content of the current NodeText tag
    Matcher innerMatcher = Pattern.compile("<(?!!)").matcher(content);
    int count = 0;
    while (innerMatcher.find()) count++;
    System.out.println(count);  // this will print 4
}

上面的代码被认为也适用于多次出现的 <NoteText> 标签的字符串。

如果您知道自己只有一个 <NoteText> 标签，只需将 while 替换为 if。

需要帮助确定正确的正则表达式格式

2 个答案: