如何在Hive中解析嵌套Xml

时间:2019-02-26 14:37:48

标签: sql xml hive hive-serde

我有这个xml

<packet>
    <field1>AA</field1>
    <students>
        <student>
              <targets>
                    <target>
                           <field3></field3>
                           <field4></field4>
                    </target>
              </targets>
              <opt><field5></field5></opt>
        </student>
    </students>
</packet>

我将DDL创建为:

CREATE EXTERNAL TABLE ext_student (

students ARRAY<STRUCT<
field1:STRING,
field2:STRING,
student:ARRAY<
              STRUCT<field3:STRING,
                     field4:STRING>,
              STRUCT <field5:STRING>>
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.field1"="/packet/field1/text()",
"column.xpath.Students"="/packet/Students/*"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION 'path to my xml'
TBLPROPERTIES (
"xmlinput.start"="<packet>",
"xmlinput.end"="</packet>");  

查询表时,我看到字段1完美,但它以这种类型的输出显示数组

hive>从ext_student中选择*;

AA {{“ field3”:NULL,“ field4”:“ NULL”},{“ field5”:“ NULL”}}

我想

  1. 使配置单元显示值(并非所有项都为空)
  2. 显示我的输出,例如 AA valueofField3 valueofField4 valueofField5

请问有人可以帮助我吗? (我不能使用Spark)

0 个答案:

没有答案