Hive XML嵌套结构数组

时间:2018-08-30 22:21:34

标签: xml database hive bigdata hiveql

所以我有这个XML文件:

<?xml version="1.0" encoding="UTF-8"?>
<pfx:BusinessUnits xmlns:pfx="http://someurl" xmlns:xsi="http://someurl" xsi:schemaLocation="http://someurl BusinessUnitDetails.xsd">
<businessUnit xmlns="http://someurl">
    <businessUnit>
        <name>store one</name>
        <number>1</number>
    </businessUnit>
    <Departments>
        <number>8</number>
        <description>division eight</description>
        <contact>
            <phone>
                <number>123456789</number>
                <type>primary</type>
            </phone>
            <type>MAIN PHONE 1</type>
            <contact>
            <phone>
                <number>987654321</number>
                <type>secondary</type>
            </phone>
            <type>MAIN PHONE 2</type>
        </contact>
    <Departments>
    <Departments>
        <number>10</number>
        <description>division 2</description>
        <contact>
            <phone>
                <number>123456789</number>
                <type>primary</type>
            </phone>
            <type>MAIN PHONE 1</type>
            <contact>
            <phone>
                <number>987654321</number>
                <type>secondary</type>
            </phone>
            <type>MAIN PHONE 2</type>
        </contact>
    <Departments>
</businessUnit>                                                          
</pfx:BusinessUnits>

这里是一个结构体数组,而内部是另一个结构体数组。我必须将此文件加载到配置单元中。所以我正在使用配置单元xml serde。

  CREATE TABLE mydatabase.xmlLoad(
    store_nbr INT,
divisions 
array<
    struct<
        divisionDepartments:struct<
            number:string,
            contact:array<struct<
                phone:struct<
                    number:string,
                    type:string>,
                type:string
            >
        >>
    >
>
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
'column.xpath.store_nbr'='/*[local-name()="businessUnitDetail"]/*[local-name()="businessUnit"]/*[local-name()="number"]/text()',
'column.xpath.divisions'='/*[local-name()="businessUnitDetail"]/*[local-name()="divisionDepartments"]/*'
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
TBLPROPERTIES (
'xmlinput.start'='<businessUnitDetail',
'xmlinput.end'='</businessUnitDetail>'
);

我只想使用MAIN PHONE 1的商店编号,部门编号和联系人信息。

不幸的是,结果使我没有结果。

输出:

store_nbr
48
98
99
100

 Departments
[{"Departments":null},{"Departments":null},{"divisiondepartments":null},{"Departments":null},{"Departments":null},{"divisiondepartments":null},{"Departments":null},{"Departments":null},

预期输出:

store_nbr | division_nbr | contact_type | phone_nbr 1 8主电话1 123456789 2 10主电话1 987654321

如何获得此结果?

0 个答案:

没有答案