无法使XML SerDe在Hive上运行

时间:2019-12-05 16:33:00

标签: xml hive hive-serde

我正在尝试使此设备工作2天,但实际上并没有发现问题所在...

我有一条类似于以下内容的XML消息

<?xml version="1.0" encoding="UTF-8"?>
<transport_status_notification:transportStatusNotificationMessage>
<sh:StandardBusinessDocumentHeader>
    <sh:HeaderVersion>1.0</sh:HeaderVersion>
    <sh:Sender>
        <sh:Identifier Authority="GS1">000055770223</sh:Identifier>
    </sh:Sender>
    <sh:Receiver>
        <sh:Identifier Authority="GS1">RBC</sh:Identifier>
    </sh:Receiver>
    <sh:DocumentIdentification>
        <sh:Standard>GS1</sh:Standard>
        <sh:TypeVersion>3.2</sh:TypeVersion>
        <sh:InstanceIdentifier>800016162320191104210011</sh:InstanceIdentifier>
        <sh:Type>Transport Status Notification</sh:Type>
        <sh:CreationDateAndTime>2019-11-05T02:00:11Z</sh:CreationDateAndTime>
    </sh:DocumentIdentification>
</sh:StandardBusinessDocumentHeader>

我正在尝试创建一个SerDe模式,但是它不起作用... 我发现SerDe不喜欢名称空间,我尝试使用/ * [local-name(。)来解决此问题,但还是没有运气。

这是我尝试执行的操作:

CREATE EXTERNAL  TABLE `default.test`(  
 `headerversion` string COMMENT 'from deserializer'
 )  
 ROW FORMAT SERDE   
 'com.ibm.spss.hive.serde2.xml.XmlSerDe'    
 WITH SERDEPROPERTIES (     
 "column.xpath.headerversion"="/*[local-name(.)='HeaderVersion']/text()"    
 )  
 STORED AS INPUTFORMAT  
 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'  
 OUTPUTFORMAT   
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'    
 LOCATION   
 's3a://dev-bucket/a/b/c'   
 TBLPROPERTIES (    
 "xmlinput.start"="<transport_status_notification","xmlinput.end"="</transport_status_notification>"    
  )

有什么主意如何使这件事起作用? 运行时,查询结果为0条记录(这当然不是我期望的) 谢谢,

0 个答案:

没有答案