我有一个以下格式的xml。 raw_xml属性的值是一个嵌套的xml,我试图解析它。 嵌套的xml有节点和属性,我只想在active flag = true时解析它们,如结果所示。
**
**
<?xml version="1.0" encoding="utf-16"?>
<ClassApplications xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" id="1234567" bundle_id="323232" version="1.0">
<Reports>
<Report active="True" raw_xml="<Response Score="474">
 <StudentXML>
 <StudentSegments><StudentSegment MaleORFemale="M" StudentID="FS54AS44F" studentname="Kathy" kob="F" ></StudentSegment><StudentSegment MaleORFemale="M" StudentID="ASD555ASF" studentname="Kelli" kob="A" ></StudentSegment><StudentSegment MaleORFemale="M" StudentID="AD5A5S5D5" studentname="Christy" kob="F" ></StudentSegment> <StudentSegment MaleORFemale="M" StudentID="AS5FE84AD" studentname="Julia" kob="Z" > </StudentSegment> <StudentSegment MaleORFemale="M" StudentID="ASD5FD1D8" studentname="Martina" kob="F" > </StudentSegment> <StudentSegment MaleORFemale="M" StudentID="ASD45454A" studentname="Sam" kob="F"> </StudentSegments> </StudentXML> </Response>"/>
<Report active="False" raw_xml="<Response Score="474">
 <StudentXML>
 <StudentSegments><StudentSegment MaleORFemale="M" StudentID="FS54AS44F" studentname="Kathy" kob="F" ></StudentSegment><StudentSegment MaleORFemale="F" StudentID="145sfg51g" studentname="Kelli" kob="A" ></StudentSegment><StudentSegment MaleORFemale="M" StudentID="AD5A5S5D5" studentname="Christy" kob="F" ></StudentSegment> <StudentSegment MaleORFemale="M" StudentID="AS5FE84AD" studentname="Julia" kob="Z" > </StudentSegment> <StudentSegment MaleORFemale="M" StudentID="ASD5FD1D8" studentname="Martina" kob="F" > </StudentSegment> <StudentSegment MaleORFemale="M" StudentID="ASD45454A" studentname="Sam" kob="F"> </StudentSegments> </StudentXML> </Response>"/>
</Reports>
</ClassApplications>
**
**
ID MaleOrFemale StudentID StudentName
1234567 M FS54AS44F Kathy
1234567 M ASD555ASF Kelli
1234567 M AD5A5S5D5 Christy
1234567 M AS5FE84AD Julia
1234567 M ASD5FD1D8 Martina
1234567 M ASD45454A Sam
我尝试使用Lateral View和Xplode编写代码但导致错误
错误:
Hive Runtime Error while processing row {"id":1234567,"rawxml":null,"input__file__name":"wasb://root@microsofttest.blob.core.windows.net/Test/Testxml001.xml"}
请让我知道如何使用配置单元查询来解析这种情况。
**** ****代码
ADD JAR wasb:///user/hivexmlserde-1.0.5.3.jar;
SET mapred.input.dir.recursive=true;
SET hive.mapred.supports.subdirectories=true;
DROP TABLE IF EXISTS ClassTable;
CREATE EXTERNAL TABLE ClassTable(
ID BIGINT,
rawxml string
)
ROW FORMAT SERDE
'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.ID" ="/ClassApplications/@id",
"column.xpath.rawxml" = "/ClassApplications/Reports/Report[@active = 'True']/@raw_xml"
)
STORED AS INPUTFORMAT
'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'wasb://root@microsofttest.blob.core.windows.net/Test/'
TBLPROPERTIES ('serialization.null.format'='', "xmlinput.start"="<ClassApplications xmlns","xmlinput.end"="</ClassApplications>" );