蜂巢 - 按年分割

时间:2013-12-10 15:52:29

标签: sql database hadoop hql hive

我在蜂巢中划分年份。我创建了一个脚本:

DROP TABLE movies_byYear;

CREATE TABLE movies_byYear (title STRING, full_name STRING, ep_name STRING, type STRING, ep_num STRING, suspended BOOLEAN) PARTITIONED BY (year INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;

INSERT OVERWRITE TABLE movies_byYear PARTITION (year='2013') SELECT title, full_name, ep_name, type, ep_num, suspended FROM movies WHERE year='2013';

但是,使用时:SELECT COUNT(*) FROM movies WHERE year='2013';

我不会在2013年之前收到所有电影,相反我会收回所有电影。

是否也可以让hive决定在哪里进行分区?

我非常感谢你的回答!!!

更新

添加year后,我得到:

INSERT OVERWRITE TABLE movies_byYear PARTITION (year=2013) SELECT title, full_name, ep_name, type, ep_num, suspended, year FROM movies WHERE year=2013;

FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target table because column number/types are different '2013': Table insclause-0 has 6 columns, but query has 7 columns.

1 个答案:

答案 0 :(得分:2)

插入时,插入:

SELECT title, full_name, ep_name, type, ep_num, suspended

添加年份...目前,movies_byYear中的year字段为空...

当你在hive的create table语句中用year指定分区时,year将是表格中的一列!!!

<强>更新

替换此

INSERT OVERWRITE TABLE movies_byYear PARTITION (year='2013') SELECT title, full_name, ep_name, type, ep_num, suspended FROM movies WHERE year='2013';

用这个:

INSERT OVERWRITE TABLE movies_byYear PARTITION (year=2013) SELECT title, full_name, ep_name, type, ep_num, suspended FROM movies WHERE year='2013';

即,删除分区中年份值周围的单引号...