从select中插入Hive表时的分区列

时间:2014-02-08 20:28:45

标签: hadoop hive

我正在研究Hive中的分区并且出现了:

http://www.brentozar.com/archive/2013/03/introduction-to-hive-partitioning/ 在此链接中,作者说:“在将数据插入分区时,必须将分区列包含在查询的最后一列中。源查询中的列名称不需要与分区列名称匹配,但它们确实需要是最后一个 - 没有办法以不同的方式连接Hive“

我有一个类似的查询:

insert overwrite table MyDestTable PARTITION (partition_date)
select
grid.partition_date,
….

我上面的查询已运行一段时间没有错误。如您所见,我选择分区列作为第一列。这是错的吗?我试图从其他来源证实作者的陈述,但我没有找到其他相同的文件。这里有人知道正确的做法是什么吗?从我的角度来看,作为一个蜂巢新手,我只是想知道Hive是否抱怨(事实并非如此)。

KS

3 个答案:

答案 0 :(得分:31)

示例:

set hive.exec.dynamic.partition=true;  
set hive.exec.dynamic.partition.mode=nonstrict;  

drop table tmp.table1;

create table tmp.table1(  
col_a string,col_b int)  
partitioned by (ptdate string,ptchannel string)  
row format delimited  
fields terminated by '\t' ;  

insert overwrite table tmp.table1 partition(ptdate,ptchannel)  
select col_a,count(1) col_b,ptdate,ptchannel
from tmp.table2
group by ptdate,ptchannel,col_a ;

答案 1 :(得分:9)

动态分区列必须在SELECT语句的列中最后指定,并且与它们在PARTITION()子句中出现的顺序相同。

有关详细信息,请参阅hive wiki

答案 2 :(得分:7)

是的,在插入数据时必须使用分区列作为最后一列。 Hive将获取最后一栏中的数据。

CREATE EXTERNAL TABLE temp (
DATA_OWNER STRING,
DISTRICT_CODE STRING,
BILLING_ACCOUNT_NO STRING,
INST_SEQUENCE_NO STRING,
SITE_NUMBER STRING,
MAIN_TEL_NO STRING,
INST_DIRECTORY_CATEGORY STRING,
INST_START_DATE STRING,
INST_CLASSIFICATION STRING,
CUSTOMER_TYPE STRING,
RENTAL_TARIFF_GROUP STRING,
STERLING_TARIFF_GROUP STRING,
SERVING_DP STRING,
INST_TITLE STRING,
INST_FORENAME STRING,
INST_INITIALS STRING,
INST_SURNAME STRING,
INST_HONOURS STRING,
INST_LOCATION_DESCRIPTION STRING,
INST_SUB_PREMISES STRING,
INST_PREMISES_NAME STRING,
INST_THOROUGHFARE_NUMB STRING,
INST_THOROUGHFARE_NAME STRING,
INST_SUB_LOCALITY STRING,
INST_POST_TOWN STRING,
INST_COUNTY STRING,
INST_POST_CODE STRING,
INST_STATUS STRING,
INST_EXCHANGE_GROUP_CODE STRING,
EXCHANGE_CODE STRING
) PARTITIONED BY (TS_LAST_UPDATED STRING)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\001'
STORED AS TEXTFILE LOCATION 'user/entity/site/inbound/CSS_INSTALLATION_PARTITIONED';

INSERT OVERWRITE TABLE temp PARTITION (TS_LAST_UPDATED)
SELECT
DATA_OWNER,
DISTRICT_CODE,
BILLING_ACCOUNT_NO,
INST_SEQUENCE_NO,
SITE_NUMBER,
MAIN_TEL_NO,
INST_DIRECTORY_CATEGORY,
INST_START_DATE,
INST_CLASSIFICATION,
CUSTOMER_TYPE,
RENTAL_TARIFF_GROUP,
STERLING_TARIFF_GROUP,
SERVING_DP,
INST_TITLE,
INST_FORENAME,
INST_INITIALS,
INST_SURNAME,
INST_HONOURS,
INST_LOCATION_DESCRIPTION,
INST_SUB_PREMISES,
INST_PREMISES_NAME,
INST_THOROUGHFARE_NUMB,
INST_THOROUGHFARE_NAME,
INST_SUB_LOCALITY,
INST_POST_TOWN,
INST_COUNTY,
INST_POST_CODE,
INST_STATUS,
INST_EXCHANGE_GROUP_CODE,
EXCHANGE_CODE,TO_DATE(TS_LAST_UPDATED) FROM temp1