如何使用属性作为列将XML数据导入Hive

时间:2013-03-22 20:05:46

标签: xpath hive hiveql

我是HiveQL的新手,我有点卡住:S

我有一个以xml格式存储的数据,我想从列的Hive表中提取此xml文件中的字段(字符串Titles_2,sting Artists_2,字符串Albums_2)。

xml数据的示例:

<?xml version="1.0" encoding="UTF-8"?><MC><SC><S uid="2" gen="" yr="2011" art="Samsung" cmp="&lt;unknown&gt;" fld="/mnt/sdcard/Samsung/Music" alb="Samsung" ttl="Over the horizon"/><S uid="37" gen="" yr="2010" art="Jason Derulo" cmp="&lt;unknown&gt;" fld="/mnt/sdcard/Music/Jason Derulo/Jason Derulo" alb="Jason Derulo" ttl="Whatcha Say"/><S uid="38" gen="" yr="2010" art="Jason Derulo" cmp="&lt;unknown&gt;" fld="/mnt/sdcard/Music/Jason Derulo/Jason Derulo" alb="Jason Derulo" ttl="In My Head"/><S uid="39" gen="" yr="2011" art="Alexandra Stan" cmp="&lt;unknown&gt;" fld="/mnt/sdcard/Music/Alexandra Stan/Mr_ Saxobeat - Single" alb="Mr. Saxobeat - Single" ttl="Mr. Saxobeat (Extended Version)"/><S uid="40" gen="" yr="2011" art="Bushido" cmp="&lt;unknown&gt;" fld="/mnt/sdcard/Music/Bushido/Jenseits von Gut und Böse (Premium Edition)" alb="Jenseits von Gut und Böse (Premium Edition)" ttl="Wie ein Löwe"/><S uid="41" gen="" yr="2011" art="Bushido" cmp="&lt;unknown&gt;" fld="/mnt/sdcard/Music/Bushido/Jenseits von Gut und Böse (Premium Edition)" alb="Jenseits von Gut und Böse (Premium Edition)" ttl="Verreckt"/><S uid="42" gen="" yr="2011" art="Lucenzo" cmp="&lt;unknown&gt;" fld="/mnt/sdcard/Music/Lucenzo/Danza Kuduro (feat_ Don Omar) [From _Fast &amp; Furious 5_] - Single" alb="Danza Kuduro (feat. Don Omar) [From &quot;Fast &amp; Furious 5&quot;] - Single" ttl="Danza Kuduro (feat. Don Omar) [From &quot;Fast &amp; Furious 5&quot;]"/><S uid="121" gen="" yr="701" art="Michael Jackson" cmp="&lt;unknown&gt;" fld="/mnt/sdcard/external_sd/Music/Michael Jackson/Bad [Bonus Tracks]" alb="Bad [Bonus Tracks]" ttl="Voice-Over Intro/Quincy Jones Interview #1 [*]"/></SC><PC/></MC>

此数据存储在名为xmlout_2(line)的表中。

现在我运行这些xpath命令来构建HiveQL表存储,但它只添加每行的第一首歌。知道为什么会这样吗?

create view xmlout_2(line) as SELECT * from hivetesttable;

    CREATE VIEW Stores(Titles_2,  Artists_2, Albums_2) AS
    SELECT 
    xpath_string ( line, '/MC/SC/*/@ttl'),
    xpath_string (line, 'MC/SC/*/@art'),
    xpath_string (line, '/MC/SC/*/@alb')
    FROM  xmlout_2;

如果我尝试xpath而不是xpath_string,我得到一个字符串数组而不是字符串。

create view xmlout_2(line) as SELECT * from hivetesttable;

    CREATE VIEW Stores(Titles_2,  Artists_2, Albums_2) AS
    SELECT 
    xpath ( line, '/MC/SC/*/@ttl'),
    xpath (line, 'MC/SC/*/@art'),
    xpath (line, '/MC/SC/*/@alb')
    FROM  xmlout_2;

我想在此之后爆炸列,但爆炸只能用于单个列。

0 个答案:

没有答案