Question

我有一个Hive表，它是通过连接多个表中的数据创建的。此数据位于包含多个文件的文件夹中（＆＃34; 0001_1＆＃34;，＆＃34; 0001_2＆＃34;，...等等）。我需要根据此表中的日期字段pt_dt创建一个分区表（通过更改此表或创建一个新表）。有没有办法做到这一点？

我尝试过创建一个新表并插入（下方）无法正常工作

create external table table2 (acct_id bigint, eval_dt string)
partitioned by (pt_dt string);
insert into table2
partition (pt_dt) 
select acct_id, eval_dt, pt_dt
from jmx948_variable_summary;

这会引发错误

＆＃34;失败：执行错误，从org.apache.hadoop.hive.ql.exec.mr.MapRedTask返回代码2 MapReduce工作推出： Stage-Stage-1：Map：189累积CPU：401.68 sec HDFS读取：0 HDFS写入：0 FAIL 总MapReduce CPU花费的时间：6分41秒680毫秒＆＃34;

Answer 1

经过一番试验后能够弄清楚吗？错误。

在Hive中启用动态分区：

SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;

为分区表创建架构：

CREATE TABLE table1 (id STRING, info STRING)
PARTITIONED BY ( tdate STRING);

插入分区表：

FROM table2 t2
INSERT OVERWRITE TABLE table1 PARTITION(tdate)
SELECT t2.id, t2.info, t2.tdate
DISTRIBUTE BY tdate;

Answer 2

在我正在使用的版本中工作（Hive 0.14.0.2.2.4.2-2）

INSERT INTO TABLE table1 PARTITION(tdate) SELECT t2.id, t2.info, t2.tdate

从源表中选择需要按last分区的列，在上例中，选择日期作为Select中的最后一列。同样，如果一个人需要通过列＆＃34; info＆＃34;来划分表格，那么

INSERT INTO TABLE table1 PARTITION(info) SELECT t2.id, , t2.tdate, t2.info

如果要创建具有多个分区的表，则select查询需要是该订单。如果你想用＆＃34; date＆＃34;对上面的表进行分区。然后＆＃34; info＆＃34;

INSERT INTO TABLE table1 PARTITION(date, info) SELECT t2.id, , t2.tdate, t2.info

使用＆＃34; info＆＃34;，然后＆＃34; date＆＃34;

INSERT INTO TABLE table1 PARTITION(info, date) SELECT t2.id, , t2.info, t2.tdate

从非分区表

2 个答案: