REGEX创建HIVE EXTERNAL TABLE - 在SELECT

时间:2017-08-03 15:19:01

标签: regex hive

我有以下示例数据:

"HD",003498,"20160913:17:04:10","D3ZYE",1
"EH","XXX-1985977-1",1,"01","20151215","20151215","20151229","20151215","2304",,,"36-126481000",1340.74,61808.00,1126.62,0.00,214.12,0.00,0.00,0.00,"30","20151229","00653845",,,"PARTS","001","ABI","20151215","Y","Y","N","36-126481000",

我创建了一个input.regex来获取字段,因为文件有很多记录类型(由记录中的前2个字符表示)

以下是我的陈述:

CREATE EXTERNAL TABLE EntryHeaderTable
(RecordType STRING
,EntryNumber STRING
,VersionNumber STRING
,EntryType STRING
,ImportDate STRING
,EntryDate STRING
,EntrySummaryDate STRING
,ForeignExportDate STRING
,PortCode STRING
,MasterBillofLading STRING
,ImporterofRecord STRING
,ImporterofRecord2 STRING
,TotalDue STRING
,EnteredValue STRING
,Duty STRING
,HarborMaintenanceFee STRING
,MerchandiseProcessingFee STRING
,DeferredTax STRING
,Tax STRING
,AD_CVD STRING
,ModeofTransportation STRING
,ACHPaymentDate STRING
,BrokerReferenceNumber STRING
,ReconciliationFlagforNAFTA STRING
,ReconciliationFlagforOther STRING
,CommodityDescriptionCode STRING
,SuretyCode STRING
,VersionReasonCode STRING
,VersionDate STRING
,ABIIndicator STRING
,PaperlessIndicator STRING
,IDProceduresFlag STRING
,UltimateConsignee STRING
,Filler STRING
)
    row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe'
    with serdeproperties ("input.regex" = "(\EH\")(\,?.*)")
STORED AS TEXTFILE
LOCATION '/users/username/co/file'
;

它给我一条消息,表明它与小组不匹配:

Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: Number of matching groups doesn't match the number of columns

但我计算了记录定义中的列,它与正则表达式生成的组的总数相匹配。

正则表达式使用txt2re或regeExr等正则表达式识别记录

更新: 我还尝试执行以下定义来表示第二个字段出现了33次

使用serdeproperties(“input.regex”=“(\ EH \”){1,33}(\,?。*)“)

我没有为此做CSV格式的唯一原因,因为我想提供记录类型作为每个记录类型更改的变量字段

0 个答案:

没有答案