仅匹配短语的第一个匹配项

时间:2016-05-20 21:42:40

标签: regex apache-nifi

我有以下Json:

{"field1": "someText",
  "field2": "Text Again",
  "field3": "Text Again"}

我需要匹配以大写字母开头的任何短语的第一个匹配项(例如" Text Again",例如)

我写了以下内容:

("[A-Za-z]+\s[A-Za-z]+")

例如,使用https://regex101.com/进行测试时,它可以正常工作。但是,它似乎没有正确地用作ReplaceTextWithMapping(Apache NiFi)的使用的一部分。正则表达式是不正确的?

感谢您的帮助

2 个答案:

答案 0 :(得分:1)

描述

:\s*"\s*(?=[A-Z])(?![^"]*?\s[a-z])([A-Za-z\s]+)"

Regular expression visualization

此正则表达式执行以下操作:

  • 在看似JSON编码的字符串
  • 的值侧找到第一个标题案例字符串
  • 确保每个单词都大写
  • 将引号内的值作为捕获组1
  • 返回

实施例

现场演示

https://regex101.com/r/eO0xW6/1

源字符串

{"field1": "someText",
  "field2": "Text again",
  "field3": "Text Again"}

第一场比赛

Text Again

解释

<强>摘要

  • :\s*"验证仅检查JSON值侧的情况
  • \s*匹配开头引用后的任何空格(如果存在)
  • (?=[A-Z])确保字符串中的第一个字符为大写
  • (?![^"]*?\s[a-z])查找后跟小写字符的任何空格。如果发现那么这不是匹配
  • ([A-Za-z\s]+)捕获引用中的所有字符
  • "与报价
  • 相符

<强>详细

NODE                     EXPLANATION
----------------------------------------------------------------------
  :                        ':'
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (?=                      look ahead to see if there is:
----------------------------------------------------------------------
    [A-Z]                    any character of: 'A' to 'Z'
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
    [^"]*?                   any character except: '"' (0 or more
                             times (matching the least amount
                             possible))
----------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
    [a-z]                    any character of: 'a' to 'z'
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [A-Za-z\s]+              any character of: 'A' to 'Z', 'a' to
                             'z', whitespace (\n, \r, \t, \f, and "
                             ") (1 or more times (matching the most
                             amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------

答案 1 :(得分:1)

我已将我对该问题的调查结果发布到Apache NiFi邮件列表:

http://apache-nifi-developer-list.39713.n7.nabble.com/Issues-with-Regex-used-with-ReplaceTextWithMapping-where-am-I-going-wrong-tc10592.html

我还没有收到社区的任何确认,但在我看来,虽然正则表达式[A-Z][A-Za-z]*\s[A-Z][A-Za-z]*在这种情况下是正确的,但处理器(ReplaceTextWithMapping)处理不好用空格(\ s)并且字符串包含两个单词之间的空格。