Question

我有这个xml

<packet>
    <field1>AA</field1>
    <students>
        <student>
              <targets>
                    <target>
                           <field3></field3>
                           <field4></field4>
                    </target>
              </targets>
              <opt><field5></field5></opt>
        </student>
    </students>
</packet>

我将DDL创建为：

CREATE EXTERNAL TABLE ext_student (

students ARRAY<STRUCT<
field1:STRING,
field2:STRING,
student:ARRAY<
              STRUCT<field3:STRING,
                     field4:STRING>,
              STRUCT <field5:STRING>>
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.field1"="/packet/field1/text()",
"column.xpath.Students"="/packet/Students/*"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION 'path to my xml'
TBLPROPERTIES (
"xmlinput.start"="<packet>",
"xmlinput.end"="</packet>");

查询表时，我看到字段1完美，但它以这种类型的输出显示数组

hive>从ext_student中选择*；

AA {{“ field3”：NULL，“ field4”：“ NULL”}，{“ field5”：“ NULL”}}

我想

使配置单元显示值（并非所有项都为空）
显示我的输出，例如 AA valueofField3 valueofField4 valueofField5

请问有人可以帮助我吗？（我不能使用Spark）

如何在Hive中解析嵌套Xml

0 个答案: