正则表达式以匹配日志流中的特定JSON结构;一个非捕获组可能存在也可能不存在

时间:2018-02-15 05:36:52

标签: arrays json regex regex-group regex-greedy

我有一个JSON结构(让我们称之为服务对象,包含一个或多个service_items):

{
  "service_charge": 7500,
  "person_id": 2,
  "service_items": [{
    "line_number": 1,
    "date_of_service": "2018-02-12",
    "provider_id": "YYYYYYY",
    "item_code": "XXXX",
    "service_type": "BBBBBBB",
    "provider_type": "CCCCCCCCC",
    "service_count": 5,
    "validation": {
      "third_party": {
        "rebates": 2200,
        "item_response": "pass"
      },
      "personal": {
        "rebates": null,
        "item_response": "fail"
      }
    }
  },{
    "line_number": 2,
    "date_of_service": "2018-02-12",
    "provider_id": "YYYYYYY",
    "item_code": "XXXX",
    "service_type": "Ancillary",
    "provider_type": "CCCCCCCCC",
    "service_count": 1,
    "validation": {
      "third_party": {
        "rebates": 2200,
        "item_response": "pass"
      },
      "personal": {
        "rebates": null,
        "item_response": "fail",
        "personal_log": [
          {
            "decision_type": "business_rule_x",
            "decision": "not allowed",
            "outcome": "fail",
            "rule_id": "12345",
            "narrative": "not allowed"
          }
        ]
      }
    }
  }
  ]
}

我正尝试使用以下正则表达式从中捕获单个 service_item 对象:

(?<service_item>\{[^{}]+(?:\{[^{}]*(?:\{[^{}]*(?:\{[^{}]*\}[^{}]*)*\}[^{}]*)*[^{}]*\}[^{}]*)*\})

问题: personal_log 数组(可在第二个服务项对象中看到)是可选的,可能存在也可能不存在。如果存在至少一个personal_log并且捕获单个service_ietms,则正则表达式工作正常;但是如果任何service_items中没有出现personal_log(如下面的JSON所示),那么它将整个服务对象作为一个匹配。

{
  "service_charge": 7500,
  "person_id": 2,
  "service_items": [{
    "line_number": 1,
    "date_of_service": "2018-02-12",
    "provider_id": "YYYYYYY",
    "item_code": "XXXX",
    "service_type": "BBBBBBB",
    "provider_type": "CCCCCCCCC",
    "service_count": 5,
    "validation": {
      "third_party": {
        "rebates": 2200,
        "item_response": "pass"
      },
      "personal": {
        "rebates": null,
        "item_response": "fail"
      }
    }
  },{
    "line_number": 2,
    "date_of_service": "2018-02-12",
    "provider_id": "YYYYYYY",
    "item_code": "XXXX",
    "service_type": "Ancillary",
    "provider_type": "CCCCCCCCC",
    "service_count": 1,
    "validation": {
      "third_party": {
        "rebates": 2200,
        "item_response": "pass"
      },
      "personal": {
        "rebates": null,
        "item_response": "fail"
      }
    }
  }
  ]
}

我想捕获service_items,无论personal_log json数组是否存在;我知道它与最内部的非捕获组有关,但我目前无法解决它。

注意:属性可能以任何顺序出现在日志流中;

任何帮助将不胜感激:)

2 个答案:

答案 0 :(得分:0)

您可以尝试添加第一个属性&#34; line_number&#34; 作为正则表达式的锚点:

(?<service_item>\{[^{}]+"line_number":[^{}]+(?:\{[^{}]*(?:\{[^{}]*(?:\{[^{}]*\}[^{}]*)*\}[^{}]*)*[^{}]*\}[^{}]*)*\})

答案 1 :(得分:0)

解析器将是正确的选择,但如果您的正则表达式引擎支持它,您可以使用递归来匹配大括号

\[[^\[\]]*+(?:(?R)[^\[\]]*+)*+\]

regex101 testcase