蜂巢分区和分组

时间:2015-04-08 08:32:22

标签: file hive

我是蜂巢的新手,想要从平台上放桌子。 我的平台如下

create table data(auth string, file string, documents string)
row format delimited
fields terminated by '\t' ;

我的桶表如下

create table test(auth string, documents string)
partitioned by (file string)
clustered by(auth) into 2 buckets ;

我必须创作A和B及其10-10个文件,  当我试图在bucketed表中插入数据时成功执行但问题是想要在同一个分区中每个作者的所有10个文件,但我得到一个包含所有10个文件内容的文件。

1 个答案:

答案 0 :(得分:0)

我假设以下表结构: flattable:

CREATE TABLE flattable (id INT, author STRING, book STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';

bucketedtable:

CREATE TABLE bucketedtable (id INT, book STRING)
partitioned by (author STRING)
CLUSTERED BY (book) INTO 10 BUCKETS;

在Hive中设置属性:

set hive.enforce.bucketing = true; 
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

从易燃物插入bucketedtable

INSERT INTO TABLE bucketedtable
PARTITION (author)
SELECT  id, book, author
FROM flattable;
  

您只需要交换分区和群集字段。