Hive Query解析嵌套为值的xml

时间:2018-04-18 18:50:51

标签: azure hadoop hive hiveql hdinsight

我有一个以下格式的xml。 raw_xml属性的值是一个嵌套的xml,我试图解析它。 嵌套的xml有节点和属性,我只想在active flag = true时解析它们,如结果所示。

**

  • XML

**

<?xml version="1.0" encoding="utf-16"?>
<ClassApplications xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" id="1234567" bundle_id="323232" version="1.0">
<Reports>
    <Report active="True" raw_xml="&lt;Response Score=&quot;474&quot;&gt;&#xD;&#xA;  &lt;StudentXML&gt;&#xD;&#xA; &lt;StudentSegments&gt;&lt;StudentSegment   MaleORFemale=&quot;M&quot;   StudentID=&quot;FS54AS44F&quot; studentname=&quot;Kathy&quot;  kob=&quot;F&quot; &gt;&lt;/StudentSegment&gt;&lt;StudentSegment   MaleORFemale=&quot;M&quot;   StudentID=&quot;ASD555ASF&quot; studentname=&quot;Kelli&quot;  kob=&quot;A&quot; &gt;&lt;/StudentSegment&gt;&lt;StudentSegment   MaleORFemale=&quot;M&quot;   StudentID=&quot;AD5A5S5D5&quot; studentname=&quot;Christy&quot;  kob=&quot;F&quot; &gt;&lt;/StudentSegment&gt; &lt;StudentSegment   MaleORFemale=&quot;M&quot;   StudentID=&quot;AS5FE84AD&quot; studentname=&quot;Julia&quot;  kob=&quot;Z&quot; &gt; &lt;/StudentSegment&gt; &lt;StudentSegment   MaleORFemale=&quot;M&quot;   StudentID=&quot;ASD5FD1D8&quot; studentname=&quot;Martina&quot;  kob=&quot;F&quot; &gt; &lt;/StudentSegment&gt;  &lt;StudentSegment   MaleORFemale=&quot;M&quot;   StudentID=&quot;ASD45454A&quot; studentname=&quot;Sam&quot;  kob=&quot;F&quot;&gt;  &lt;/StudentSegments&gt;   &lt;/StudentXML&gt; &lt;/Response&gt;"/>
    <Report active="False" raw_xml="&lt;Response Score=&quot;474&quot;&gt;&#xD;&#xA;  &lt;StudentXML&gt;&#xD;&#xA; &lt;StudentSegments&gt;&lt;StudentSegment   MaleORFemale=&quot;M&quot;   StudentID=&quot;FS54AS44F&quot; studentname=&quot;Kathy&quot;  kob=&quot;F&quot; &gt;&lt;/StudentSegment&gt;&lt;StudentSegment   MaleORFemale=&quot;F&quot;   StudentID=&quot;145sfg51g&quot; studentname=&quot;Kelli&quot;  kob=&quot;A&quot; &gt;&lt;/StudentSegment&gt;&lt;StudentSegment   MaleORFemale=&quot;M&quot;   StudentID=&quot;AD5A5S5D5&quot; studentname=&quot;Christy&quot;  kob=&quot;F&quot; &gt;&lt;/StudentSegment&gt; &lt;StudentSegment   MaleORFemale=&quot;M&quot;   StudentID=&quot;AS5FE84AD&quot; studentname=&quot;Julia&quot;  kob=&quot;Z&quot; &gt; &lt;/StudentSegment&gt; &lt;StudentSegment   MaleORFemale=&quot;M&quot;   StudentID=&quot;ASD5FD1D8&quot; studentname=&quot;Martina&quot;  kob=&quot;F&quot; &gt; &lt;/StudentSegment&gt;  &lt;StudentSegment   MaleORFemale=&quot;M&quot;   StudentID=&quot;ASD45454A&quot; studentname=&quot;Sam&quot;  kob=&quot;F&quot;&gt;  &lt;/StudentSegments&gt;   &lt;/StudentXML&gt; &lt;/Response&gt;"/>
</Reports>
</ClassApplications>

**

  • 结果:

**

ID  MaleOrFemale    StudentID   StudentName
1234567 M   FS54AS44F   Kathy
1234567 M   ASD555ASF   Kelli
1234567 M   AD5A5S5D5   Christy
1234567 M   AS5FE84AD   Julia
1234567 M   ASD5FD1D8   Martina
1234567 M   ASD45454A   Sam

我尝试使用Lateral View和Xplode编写代码但导致错误

错误:

Hive Runtime Error while processing row {"id":1234567,"rawxml":null,"input__file__name":"wasb://root@microsofttest.blob.core.windows.net/Test/Testxml001.xml"}

请让我知道如何使用配置单元查询来解析这种情况。

**** ****代码

ADD JAR wasb:///user/hivexmlserde-1.0.5.3.jar;

SET mapred.input.dir.recursive=true;
SET hive.mapred.supports.subdirectories=true;

DROP TABLE IF EXISTS ClassTable;

CREATE EXTERNAL TABLE ClassTable(
  ID BIGINT,
  rawxml string

)
ROW FORMAT SERDE 
  'com.ibm.spss.hive.serde2.xml.XmlSerDe' 
WITH SERDEPROPERTIES ( 
   "column.xpath.ID" ="/ClassApplications/@id",

  "column.xpath.rawxml" = "/ClassApplications/Reports/Report[@active = 'True']/@raw_xml"
)
STORED AS INPUTFORMAT 
  'com.ibm.spss.hive.serde2.xml.XmlInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'wasb://root@microsofttest.blob.core.windows.net/Test/'

TBLPROPERTIES ('serialization.null.format'='', "xmlinput.start"="<ClassApplications xmlns","xmlinput.end"="</ClassApplications>" );

0 个答案:

没有答案