我应该如何加载其中包含注释和空格的XML文件,然后在根元素上使用XMLGET,所以无法获取子元素

时间:2019-10-21 12:21:38

标签: snowflake-data-warehouse

(代表雪花用户提交)


使用:

<clinical_study>
 <!-- This xml conforms to an XML Schema at:
  https://clinicaltrials.gov/ct2/html/images/info/public.xsd -->
 <required_header>
  <download_date>ClinicalTrials.gov processed this data on September 13, 2019</download_date>
  <link_text>Link to the current ClinicalTrials.gov record.</link_text>
  <url>https://clinicaltrials.gov/show/NCT00010010</url>
 </required_header>
 <id_info>
  <org_study_id>CDR0000068431</org_study_id>
  <secondary_id>NYU-0004</secondary_id>
  <secondary_id>P-UPJOHN-NYU-0004</secondary_id>
  <secondary_id>NCI-G00-1906</seco

我正在获取null而不是获取根元素的内容。我已经阅读了Snowflake文档中的“ How to Easily Load and Query XML Data with Snowflake Part 2”,并且正在使用:

SELECT XMLGET(src_xml, 'clinical_study'):"$",
*
FROM STG_XML
;

...但是它给了我NULL,因为我试图使用上述SQL来获取根元素的内容。


有什么想法,建议和/或解决方法吗?

2 个答案:

答案 0 :(得分:2)

正如Mike Walton所说,XML是不完整的(这使其他人无法轻易重现OP正在询问的NULL)。如果我们关闭打开的XML元素,则XMLGET中的NULL问题是“ clinical_study”是根节点。...XMLGET检索内部内的元素。为了返回根节点的内容,可以使用以下表达式:

src_xml:"$" AS clinical_study_contents

这里有一个简单的测试工具来演示这一点,以及XMLGET的有效使用(提取“ id_info”元素的内容):

WITH STG_XML AS (
  SELECT PARSE_XML($1) AS src_xml
    FROM VALUES
           ($$
<clinical_study>
 <!-- This xml conforms to an XML Schema at:
  https://clinicaltrials.gov/ct2/html/images/info/public.xsd -->
 <required_header>
  <download_date>ClinicalTrials.gov processed this data on September 13, 2019</download_date>
  <link_text>Link to the current ClinicalTrials.gov record.</link_text>
  <url>https://clinicaltrials.gov/show/NCT00010010</url>
 </required_header>
 <id_info>
  <org_study_id>CDR0000068431</org_study_id>
  <secondary_id>NYU-0004</secondary_id>
  <secondary_id>P-UPJOHN-NYU-0004</secondary_id>
  <secondary_id>NCI-G00-1906</secondary_id>
 </id_info>
</clinical_study>
$$)
)
SELECT src_xml:"$" AS clinical_study_contents
      ,XMLGET(src_xml, 'id_info') as id_info_element
      ,*
  FROM STG_XML
;

答案 1 :(得分:0)

Here is the Good Blog :

https://community.snowflake.com/s/article/Querying-Nested-XML-in-Snowflake

Also , PFB  way to query nested XML elements.

    Sample XML :

    <?xml version="1.0"?>
    <comtec version="2008">
        <customer_transport_order>
            <id>2880ORO</id>
            <order_number>99833104701</order_number>
            <priority>0</priority>
            <order_date>2019-03-22</order_date>
            <order_kind>
                <code>VMI</code>
                <name>VMI</name>
            </order_kind>
            <operational>true</operational>
            <order_status>
                <code>cancel</code>
                <name>cancel</name>
                <status_kind>cancel</status_kind>
            </order_status>
            <contact>
                <id>CEN143096</id>
                <code>CEN127431</code>
                <name>SOUTHERN UNITED ENTERPRISES</name>
            </contact>
        </customer_transport_order>
    </comtec>

    Sample Query:


        select
               XMLGET( cust.value, 'order_number' ):"$"::integer as cust_order,
               XMLGET( cust.value, 'order_date' ):"$"::string as cust_date,
               XMLGET( orderkind.value, 'code' ):"$"::string as order_kind,
               XMLGET( contactval.value, 'id' ):"$"::string as contactval,
               XMLGET( contactval.value, 'code' ):"$"::string as contactcode,
               XMLGET( contactval.value, 'name' ):"$"::string as contactname
        from
            dept_emp_addr
            ,  lateral FLATTEN(dept_emp_addr.xmldata:"$") cust
            , lateral FLATTEN(cust.value:"$") orderkind
            , lateral FLATTEN(cust.value:"$") contactval
          where cust.value like '<customer_transport_order>%' AND  orderkind.value like '<order_kind>%'
          AND contactval.value like '<contact>%'
          ORDER BY cust_order;


  [1]: https://community.snowflake.com/s/article/Querying-Nested-XML-in-Snowflake