Question

我在HDFS中有一些没有分隔符的数据。也就是说，各个数据字段由它们在行中的位置来标识。

例如，

CountryXTOWNYCRIMEVALUEZ

所以这里的国家将是0到7号位置，8到12镇，犯罪统计数据是13到23位。

有没有办法将这样组织的数据直接导入Hive？我想一个可行的方法是设计一个分隔数据的map reduce工作，但我想知道是否有一个可以用来直接导入数据的Hive命令？

Answer 1

<强> RegexSerDe

create external table mytable 
( 
    country         string
   ,town            string
   ,crime_statistic string 
)
row format serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
with serdeproperties  
(
    'input.regex' = '^(.{8})(.{5})(.*)$'
)
location '/...location of the data...'
;

select * from mytable
;

+----------+-------+-----------------+
| country  | town  | crime_statistic |
+----------+-------+-----------------+
| CountryX | TOWNY | CRIMEVALUEZ     |
+----------+-------+-----------------+

Hive包含没有分隔符的数据

1 个答案: