所以我有这个XML文件:
<?xml version="1.0" encoding="UTF-8"?>
<pfx:BusinessUnits xmlns:pfx="http://someurl" xmlns:xsi="http://someurl" xsi:schemaLocation="http://someurl BusinessUnitDetails.xsd">
<businessUnit xmlns="http://someurl">
<businessUnit>
<name>store one</name>
<number>1</number>
</businessUnit>
<Departments>
<number>8</number>
<description>division eight</description>
<contact>
<phone>
<number>123456789</number>
<type>primary</type>
</phone>
<type>MAIN PHONE 1</type>
<contact>
<phone>
<number>987654321</number>
<type>secondary</type>
</phone>
<type>MAIN PHONE 2</type>
</contact>
<Departments>
<Departments>
<number>10</number>
<description>division 2</description>
<contact>
<phone>
<number>123456789</number>
<type>primary</type>
</phone>
<type>MAIN PHONE 1</type>
<contact>
<phone>
<number>987654321</number>
<type>secondary</type>
</phone>
<type>MAIN PHONE 2</type>
</contact>
<Departments>
</businessUnit>
</pfx:BusinessUnits>
这里是一个结构体数组,而内部是另一个结构体数组。我必须将此文件加载到配置单元中。所以我正在使用配置单元xml serde。
CREATE TABLE mydatabase.xmlLoad(
store_nbr INT,
divisions
array<
struct<
divisionDepartments:struct<
number:string,
contact:array<struct<
phone:struct<
number:string,
type:string>,
type:string
>
>>
>
>
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
'column.xpath.store_nbr'='/*[local-name()="businessUnitDetail"]/*[local-name()="businessUnit"]/*[local-name()="number"]/text()',
'column.xpath.divisions'='/*[local-name()="businessUnitDetail"]/*[local-name()="divisionDepartments"]/*'
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
TBLPROPERTIES (
'xmlinput.start'='<businessUnitDetail',
'xmlinput.end'='</businessUnitDetail>'
);
我只想使用MAIN PHONE 1的商店编号,部门编号和联系人信息。
不幸的是,结果使我没有结果。
输出:
store_nbr
48
98
99
100
Departments
[{"Departments":null},{"Departments":null},{"divisiondepartments":null},{"Departments":null},{"Departments":null},{"divisiondepartments":null},{"Departments":null},{"Departments":null},
预期输出:
store_nbr | division_nbr | contact_type | phone_nbr 1 8主电话1 123456789 2 10主电话1 987654321
如何获得此结果?