我是蜂巢的新手,所以有一个基本问题:如何创建查询以使查询结果以特定方式划分?
例如:
CREATE TABLE IF NOT EXISTS tbl_x (
x SMALLINT,
y FLOAT)
PARTITIONED BY (id SMALLINT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS ORC;
INSERT INTO TABLE `tbl_x`
VALUES (1, 1, 1.0),
(1, 1, 2.0),
(1, 2, 3.0),
(1, 2, 4.0),
(2, 1, 5.0),
(2, 1, 6.0),
(2, 2, 7.0),
(2, 2, 8.0);
CREATE TABLE tbl_y AS SELECT `id`, `x`, SUM(`y`) AS `y_sum`
FROM `tbl_x`
GROUP BY `id`, `x`;
在该示例中,我也希望对tbl_y进行分区。
尝试不起作用:
CREATE TABLE tbl_y AS SELECT `id`, `x`, SUM(`y`) AS `y_sum`
FROM `tbl_x`
GROUP BY `id`, `x` PARTITIONED BY (id SMALLINT);
这里的窍门是什么?我应该先定义分区表并将结果插入其中吗?
答案 0 :(得分:1)
是的,您应该创建一个分区表,因为不支持select(CTAS)创建分区表。
CREATE TABLE tbl_y(x smallint,y_sum double)
partitioned by (id smallint)
STORED AS ORC;
如果表架构相同,则可以使用CREATE LIKE:
CREATE TABLE tbl_y like tbl_x;
您还可以使用DISTRIBUTE BY
在减速器之间平均分配数据,另请参见以下答案:https://stackoverflow.com/a/38475807/2700344
insert overwrite table tbl_y partition(id)
select id, x, SUM(y) AS y_sum
fromtbl_x
group by id, x
distribute by id, FLOOR(RAND()*100.0)%20;