将复杂的nexted XML加载到hive表

时间:2018-02-16 12:10:33

标签: sql xml hadoop hive xml-parsing

是否可以将如下所示的复杂嵌套xml添加到hive表中。

<items>
<item id="0001" type="donut">
<name>Cake</name>
<ppu>0.55</ppu>
<batters>
<batter id="1001">Regular</batter>
<batter id="1002">Chocolate</batter>
<batter id="1003">Blueberry</batter>
<batter id="1003">Devil's Food</batter>
</batters>
<topping id="5001">None</topping>
<topping id="5002">Glazed</topping>
<topping id="5005">Sugar</topping>
<topping id="5007">Powdered Sugar</topping>
<topping id="5006">Chocolate with Sprinkles</topping>
<topping id="5003">Chocolate</topping>
<topping id="5004">Maple</topping>
</item>
<item id="0002" type="donut">
<name>Raised</name>
<ppu>0.55</ppu>
<batters>
<batter id="1001">Regular</batter>
</batters>
<topping id="5001">None</topping>
<topping id="5002">Glazed</topping>
<topping id="5005">Sugar</topping>
<topping id="5003">Chocolate</topping>
<topping id="5004">Maple</topping>
</item>
<item id="0003" type="donut">
<name>Buttermilk</name>
<ppu>0.55</ppu>
<batters>
<batter id="1001">Regular</batter>
<batter id="1002">Chocolate</batter>
</batters>
</item>
<item id="0004" type="bar">
<name>Bar</name>
<ppu>0.75</ppu>
<batters>
<batter id="1001">Regular</batter>
</batters>
<topping id="5003">Chocolate</topping>
<topping id="5004">Maple</topping>
<fillings>
<filling id="7001">
<name>None</name>
<addcost>0</addcost>
</filling>
<filling id="7002">
<name>Custard</name>
<addcost>0.25</addcost>
</filling>
<filling id="7003">
<name>Whipped Cream</name>
<addcost>0.25</addcost>
</filling>
</fillings>
</item>
<item id="0005" type="twist">
<name>Twist</name>
<ppu>0.65</ppu>
<batters>
<batter id="1001">Regular</batter>
</batters>
<topping id="5002">Glazed</topping>
<topping id="5005">Sugar</topping>
</item>
<item id="0006" type="filled">
<name>Filled</name>
<ppu>0.75</ppu>
<batters>
<batter id="1001">Regular</batter>
</batters>
<topping id="5002">Glazed</topping>
<topping id="5007">Powdered Sugar</topping>
<topping id="5003">Chocolate</topping>
<topping id="5004">Maple</topping>
<fillings>
<filling id="7002">
<name>Custard</name>
<addcost>0</addcost>
</filling>
<filling id="7003">
<name>Whipped Cream</name>
<addcost>0</addcost>
</filling>
<filling id="7004">
<name>Strawberry Jelly</name>
<addcost>0</addcost>
</filling>
<filling id="7005">
<name>Rasberry Jelly</name>
<addcost>0</addcost>
</filling>
</fillings>
</item>
</items>

我能够映射到1001,1002,1003,但是相同的值,我无法提取。 我将xml加载到hive表并使用xpath解压缩。我需要定期,巧克力,蓝莓的价值。

我将以下内容添加到hive表(store.choclate)中,并将查询添加为

从store.chocolate中选择xpath(str,'/ items / item / batters / batter / @ id')

这给出了值1001,1002,1003。如何编写查询来提取常规,巧克力和蓝色?

1 个答案:

答案 0 :(得分:0)

在查询中将xml数据作为单个表加载并在其上创建视图。该查询的结构为

select xpath(str, '/items/item/batters/batter[@id="1001"]/text()')

这将提取&#39; Regular&#39;的价值。在类似的基础上,它可以为其他领域构建