需要帮助确定正确的正则表达式格式

时间:2021-05-17 17:00:33

标签: regex regex-lookarounds

我正在学习正则表达式,正在努力寻找满足以下条件的正则表达式格式:

  1. 检查“”和“”之间的内容
  2. 如果有一个或多个“<”符号后面没有“!”,则返回所有标识的“<”符号。

示例:

<NoteText><![CDATA[dvsdhjkndlv        <<<RED>>>  <72901> </NoteText>

这应该返回 RED 之前的 3 个“<”和 72901 之前的 1 个“<”

最初我尝试使用下面的负前瞻正则表达式模式。

<(?!!)

但它也会返回“NoteText”短语之前的“<”。

我不确定如何限制“”和“”之间的过滤区域。 尝试以下方法也不起作用。

(?:<NoteText>.*)(<(?!!)).*(?:<\/NoteText>)

2 个答案:

答案 0 :(得分:0)

PCRE,不漂亮,但有效:

(?:\G(?!\A)|<NoteText>)(?:(?!<\/?NoteText>).)*?\K<(?!!)(?=(?:(?!<\/?NoteText>).)*?<\/NoteText>)

regex proof

说明

--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    \G                       where the last m//g left off
--------------------------------------------------------------------------------
    (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
      \A                       the beginning of the string
--------------------------------------------------------------------------------
    )                        end of look-ahead
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    <NoteText>               '<NoteText>'
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the least amount possible)):
--------------------------------------------------------------------------------
    (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
      <                        '<'
--------------------------------------------------------------------------------
      \/?                      '/' (optional (matching the most
                               amount possible))
--------------------------------------------------------------------------------
      NoteText>                'NoteText>'
--------------------------------------------------------------------------------
    )                        end of look-ahead
--------------------------------------------------------------------------------
    .                        any character except \n
--------------------------------------------------------------------------------
  )*?                      end of grouping
--------------------------------------------------------------------------------
  \K                       match reset operator
--------------------------------------------------------------------------------
  <                        '<'
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    !                        '!'
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    (?:                      group, but do not capture (0 or more
                             times (matching the least amount
                             possible)):
--------------------------------------------------------------------------------
      (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
        <                        '<'
--------------------------------------------------------------------------------
        \/?                      '/' (optional (matching the most
                                 amount possible))
--------------------------------------------------------------------------------
        NoteText>                'NoteText>'
--------------------------------------------------------------------------------
      )                        end of look-ahead
--------------------------------------------------------------------------------
      .                        any character except \n
--------------------------------------------------------------------------------
    )*?                      end of grouping
--------------------------------------------------------------------------------
    <                        '<'
--------------------------------------------------------------------------------
    \/                       '/'
--------------------------------------------------------------------------------
    NoteText>                'NoteText>'
--------------------------------------------------------------------------------
  )                        end of look-ahead

答案 1 :(得分:0)

这是 Java 8 中的一种工作方法。请记住,只有当您没有 嵌套 <NoteText> 标记时,这才有效。

String myString = "<NoteText><![CDATA[dvsdhjkndlv        <<<RED>>>  <72901> </NoteText>";
Matcher outerMatcher = Pattern.compile("(?<=<NoteText>).*?(?=</NoteText>)").matcher(myString);
while (outerMatcher.find()) {
    String content = outerMatcher.group();  // this is the content of the current NodeText tag
    Matcher innerMatcher = Pattern.compile("<(?!!)").matcher(content);
    int count = 0;
    while (innerMatcher.find()) count++;
    System.out.println(count);  // this will print 4
}

上面的代码被认为也适用于多次出现的 <NoteText> 标签的字符串。

如果您知道自己只有一个 <NoteText> 标签,只需将 while 替换为 if