在Hive表上删除/映射重复项键?

时间:2017-05-26 08:55:03

标签: json hadoop hive hiveql hive-serde

我有要加载到hive表的JSON文件,但它包含重复键,使所有数据无效或无法在Hive上选择查询。

那些JSON文件有这样的东西:

{"timeSeries":"17051233123","id":"123","timeseries":"17051233123","name":"sample"}

我尝试创建配置表

CREATE EXTERNAL TABLE table_hive (`id` 
STRING, `name` STRING, `timeseries` STRING,`timeseries2` STRING)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( "mapping.timeseries2" = "timeSeries") 
LOCATION 'app/jsonfile.json';

如何让它成为可查询的hive表?

1 个答案:

答案 0 :(得分:0)

与Hive发行版附带的JSON SerDe一起使用

create external table table_hive 
(
    id          string
   ,name        string   
   ,timeseries  string
)
row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
stored as textfile
;
select * from table_hive
;
+-----+--------+-------------+
| id  |  name  | timeseries  |
+-----+--------+-------------+
| 123 | sample | 17051233123 |
+-----+--------+-------------+