我有这个xml
<packet>
<field1>AA</field1>
<students>
<student>
<targets>
<target>
<field3></field3>
<field4></field4>
</target>
</targets>
<opt><field5></field5></opt>
</student>
</students>
</packet>
我将DDL创建为:
CREATE EXTERNAL TABLE ext_student (
students ARRAY<STRUCT<
field1:STRING,
field2:STRING,
student:ARRAY<
STRUCT<field3:STRING,
field4:STRING>,
STRUCT <field5:STRING>>
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.field1"="/packet/field1/text()",
"column.xpath.Students"="/packet/Students/*"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION 'path to my xml'
TBLPROPERTIES (
"xmlinput.start"="<packet>",
"xmlinput.end"="</packet>");
查询表时,我看到字段1完美,但它以这种类型的输出显示数组
hive>从ext_student中选择*;
AA {{“ field3”:NULL,“ field4”:“ NULL”},{“ field5”:“ NULL”}}
我想
请问有人可以帮助我吗? (我不能使用Spark)