Question

我在从大文本日志中提取特定变量时遇到问题。

普通日志如下：

 metadata {
    unique_id: "88dvsq113-0dcf-410f-84fb-d342076def6f"
    webhook_response_time: 155
    intent_name: "Dogs are the best"
    variable_one: "true"
    variable_two: "false"
    variable_three: "false"
  }

我只想提取intent_name变量，所以我使用正则表达式：

SELECT REGEXP_EXTRACT(textPayload, r"intent_name:(.+)") AS intent_name FROM table1

仅提取“狗是最好的”值。现在，在日志中，有两个不同的部分，包括短语“ intent_name”，因此此正则表达式不会将我需要的东西提取出来。这是下面的新日志示例：

  metadata {
    intent_id: "a664f00f-8105-4e09-bc34-2836dbe89ee1"
    webhook_response_time: 105
    intent_name: "Dogs are the best"
    execution_sequence {
      intent_id: "e231c181-31d9-4bfa-b2d8-7a52314bc628"
      intent_name: "Cats are the best"
      variable_one: "true"
      variable_two: "false"
      variable_three: "false"
    }

我该如何编写一个表达式来仅提取第一个intent_name值“狗是最好的”，而不是不在execution_sequence括号内的那个？

Answer 1

JSON值会容易得多。但是对于第二种日志格式，您可以执行以下操作：

strictNullChecks

这不适用于第一种格式，但是如果您需要同时支持这两种格式，则可以使用select regexp_extract(textPayload, r"""intent_name: ("[^"]+")[\s\S]*execution_sequence""") from (select '''metadata { unique_id: "88dvsq113-0dcf-410f-84fb-d342076def6f" webhook_response_time: 155 intent_name: "Dogs are the best" variable_one: "true" variable_two: "false" variable_three: "false" }''' as textPayload union all SELECT '''metadata { intent_id: "a664f00f-8105-4e09-bc34-2836dbe89ee1" webhook_response_time: 105 intent_name: "Dogs are the best" execution_sequence { intent_id: "e231c181-31d9-4bfa-b2d8-7a52314bc628" intent_name: "Cats are the best" variable_one: "true" variable_two: "false" variable_three: "false" }''' ) x表达式。

使用REGEX_EXTRACT从重复项中提取一个值，但不提取另一个值

1 个答案: