在雪花中使用递归层次结构解析 XML

时间:2021-04-25 15:30:14

标签: xml snowflake-cloud-data-platform

我正在尝试解析以下具有递归层次结构的 XML。我只能循环一次,第二次调查数据永远不会被填充。另外,我在列中得到 NULL 值

<DATA_EXPORT>
<SURVEYDATA>
    <SURVEY_ID>1</SURVEY_ID>
    <CLIENT_ID>ABC</CLIENT_ID>
    <COMMENTS>
      <RESPONSE>
        <QUESTION>Do you drink?</QUESTION>
        <ANSWER>Yes</ANSWER>
      </RESPONSE>
    </COMMENTS>
    <COMMENTS>
      <RESPONSE>
        <QUESTION>Do you Smoke?</QUESTION>
        <ANSWER>Yes</ANSWER>
      </RESPONSE>
    </COMMENTS>
</SURVEYDATA>
<SURVEYDATA>
    <SURVEY_ID>2</SURVEY_ID>
    <CLIENT_ID>DEF</CLIENT_ID>
    <COMMENTS>
      <RESPONSE>
        <QUESTION>Do you drink?</QUESTION>
        <ANSWER>No</ANSWER>
      </RESPONSE>
    </COMMENTS>
</SURVEYDATA>
</DATA_EXPORT>

使用的查询:

SELECT 
GET(XMLGET(XMLGET(TEST_XML_1, 'SURVEYDATA'),'SURVEY_ID'), '$') AS SURVEY_ID,
GET(XMLGET(D.VALUE, 'QUESTION'), '$') AS QUESTION,
GET(XMLGET(D.VALUE, 'ANSWER'), '$') AS ANSWER
FROM DATA,
LATERAL FLATTEN (GET(XMLGET(TEST_XML_1, 'SURVEYDATA', 0), '$'))D;

我得到的输出是:

<头>
SURVEY_ID 问题 答案
1 NULL NULL
1 NULL NULL
1 NULL NULL
1 NULL NULL

我期望的输出是:

<头>
SURVEY_ID 问题 答案
1 你喝酒吗?
1 你抽烟吗?
2 你喝酒吗? 没有

1 个答案:

答案 0 :(得分:1)

所以看起来你现在想遍历 DATA_EXPORT 中的对象,你需要得到那个对象,GET(xml, '$') 会给你,因此下面会给你两行 {{ 1}}

SURVEYDATA

假设您需要survey_id和cleint_id,现在让我们将这些加上嵌套的评论拉出来,这样我们就可以看到我们正在获取我们想要的数据:

SELECT q.*
FROM TEST_XML,
  LATERAL FLATTEN(GET(src_xml, '$')) q;

但我们注意到这只有一个评论,所以需要循环而不是跨评论,而是实际上跨 SURVEYDATA 的对象,但只保留评论:

SELECT 
    get(XMLGET(q.value, 'SURVEY_ID'), '$') as survey_id
    ,get(XMLGET(q.value, 'CLIENT_ID'), '$') as client_id
    ,XMLGET(q.value, 'COMMENTS') as comments
FROM TEST_XML,
  LATERAL FLATTEN(GET(src_xml, '$')) q;

现在我们可以解压我们想要的评论值:

SELECT 
    get(XMLGET(q.value, 'SURVEY_ID'), '$') as survey_id
    ,get(XMLGET(q.value, 'CLIENT_ID'), '$') as client_id
    ,XMLGET(q.value, 'COMMENTS') as comments
    ,get(q.value, '$')
    ,c.*
FROM TEST_XML,
  LATERAL FLATTEN(GET(src_xml, '$')) q,
  LATERAL FLATTEN(get(q.value, '$')) c
WHERE get(c.value, '@')='COMMENTS'

所以现在我们可以看到我们拥有我们想要的所有值,我们可以稍微压缩 SQL,这样它就没有我们用来帮助​​我们解决问题的中间值。

给出最终的 SQL,在 CTE 中包含数据以帮助测试:

SELECT 
    get(XMLGET(q.value, 'SURVEY_ID'), '$') as survey_id
    ,get(XMLGET(q.value, 'CLIENT_ID'), '$') as client_id
    ,c.value
    ,XMLGET(c.value, 'RESPONSE') as resp
    ,get(XMLGET(resp, 'QUESTION'), '$') as question
    ,get(XMLGET(resp, 'ANSWER'), '$' ) as answer
FROM TEST_XML,
  LATERAL FLATTEN(GET(src_xml, '$')) q,
  LATERAL FLATTEN(get(q.value, '$')) c
WHERE get(c.value, '@')='COMMENTS'

给出结果:

with TEST_XML as (
  select parse_xml('<DATA_EXPORT>
  <SURVEYDATA>
    <SURVEY_ID>1</SURVEY_ID>
     <CLIENT_ID>ABC</CLIENT_ID>
     <COMMENTS>
       <RESPONSE>
         <QUESTION>Do you drink?</QUESTION>
         <ANSWER>Yes</ANSWER>
       </RESPONSE>
     </COMMENTS>
     <COMMENTS>
       <RESPONSE>
         <QUESTION>Do you Smoke?</QUESTION>
         <ANSWER>Yes</ANSWER>
       </RESPONSE>
     </COMMENTS>
   </SURVEYDATA>
   <SURVEYDATA>
     <SURVEY_ID>2</SURVEY_ID>
     <CLIENT_ID>DEF</CLIENT_ID>
     <COMMENTS>
       <RESPONSE>
         <QUESTION>Do you drink?</QUESTION>
         <ANSWER>No</ANSWER>
       </RESPONSE>
     </COMMENTS>
   </SURVEYDATA>
 </DATA_EXPORT>') as SRC_XML
  )
SELECT 
    get(XMLGET(q.value, 'SURVEY_ID'), '$') as survey_id
    ,get(XMLGET(q.value, 'CLIENT_ID'), '$') as client_id
    ,get(XMLGET(XMLGET(c.value, 'RESPONSE'), 'QUESTION'), '$') as question
    ,get(XMLGET(XMLGET(c.value, 'RESPONSE'), 'ANSWER'), '$' ) as answer
FROM TEST_XML,
  LATERAL FLATTEN(GET(src_xml, '$')) q,
  LATERAL FLATTEN(get(q.value, '$')) c
WHERE get(c.value, '@')='COMMENTS'