由多个空格分隔的Hive表源

时间:2016-12-13 06:25:45

标签: hive hadoop2 hive-serde

如何通过一个或多个空格制作下表源分隔符

CREATE EXTERNAL TABLE weather (USAF INT, WBAN INT, `Date` STRING, DIR STRING, SPD INT, GUS INT, CLG INT, SKC STRING, L STRING, M STRING, H STRING, VSB DECIMAL, MW1 STRING, MW2 STRING, MW3 STRING, MW4 STRING, AW1 STRING, AW2 STRING, AW3 STRING, AW4 STRING, W STRING, TEMP INT, DEWP INT, SLP DECIMAL, ALT DECIMAL, STP DECIMAL, MAX INT, MIN INT, PCP01 DECIMAL, PCP06 DECIMAL, PCP24 DECIMAL, PCPXX DECIMAL, SD INT)
COMMENT 'weather table'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
LINES TERMINATED BY '\n'
LOCATION '/data/Weather';

例如,USAF和WBAN之间的空格数量只有一个空格,而“日期”和“日期”之间的空格数量只有一个。到DIR有3个空格。

我尝试了以下内容:

CREATE EXTERNAL TABLE weather (USAF INT, WBAN INT, `Date` STRING, DIR STRING, SPD INT, GUS INT, CLG INT, SKC STRING, L STRING, M STRING, H STRING, VSB DECIMAL, MW1 STRING, MW2 STRING, MW3 STRING, MW4 STRING, AW1 STRING, AW2 STRING, AW3 STRING, AW4 STRING, W STRING, TEMP INT, DEWP INT, SLP DECIMAL, ALT DECIMAL, STP DECIMAL, MAX INT, MIN INT, PCP01 DECIMAL, PCP06 DECIMAL, PCP24 DECIMAL, PCPXX DECIMAL, SD INT)
COMMENT 'weather table'
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex" = "\s+"
)
STORED AS TEXTFILE
LOCATION '/data/Weather';

但是当运行像select * from weather limit 100;这样的简单查询时,我收到以下错误:

Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: Number of matching groups doesn't match the number of columns
Time taken: 2.958 seconds

1 个答案:

答案 0 :(得分:0)

使用多分隔符serde

ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' 
WITH SERDEPROPERTIES ("field.delim"="\\s+")

您的最终查询可能如下所示:

CREATE EXTERNAL TABLE weather (USAF INT, WBAN INT, `Date` STRING, DIR STRING, SPD INT, GUS INT, CLG INT, SKC STRING, L STRING, M STRING, H STRING, VSB DECIMAL, MW1 STRING, MW2 STRING, MW3 STRING, MW4 STRING, AW1 STRING, AW2 STRING, AW3 STRING, AW4 STRING, W STRING, TEMP INT, DEWP INT, SLP DECIMAL, ALT DECIMAL, STP DECIMAL, MAX INT, MIN INT, PCP01 DECIMAL, PCP06 DECIMAL, PCP24 DECIMAL, PCPXX DECIMAL, SD INT)
COMMENT 'weather table'
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' 
WITH SERDEPROPERTIES ("field.delim"="\\s+")
STORED AS TEXTFILE
LOCATION '/data/Weather';

希望这能帮到你!!!