文件empdetails.log具有以下数据 -
100 AAA 12000 HYD
101 BBB 13000 PUNE
102 CCC 14000 HYD
103 DDD 10000 BLORE
104 EEE 12000 PUNE
我想将这些数据加载到带有动态分区的'Emp'表中,以便从Emp中选择*;给我以下输出(按位置分区)。
100 AAA 12000 HYD
102 CCC 14000 HYD
101 BBB 13000 PUNE
104 EEE 12000 PUNE
103 DDD 10000 BLORE
任何人都可以提供在配置单元中执行的加载命令。
表创建 - create table Emp(cid int,cname string,csal int) 由(cloc字符串)分区 行格式分隔 字段以'\ t'结尾 存储为文本文件;
答案 0 :(得分:0)
对于动态分区,您必须使用INSERT ... SELECT查询(Hive插入)。
将数据插入到具有DP的Hive表中,这是一个两步过程。
另外,在Hive中设置以下属性。
以下示例适用于cloudera VM。
-- Extract orders data from mysql (Retail_DB.products)
select * from orders into outfile '/tmp/orders_data.psv' fieldsterminated by '|' lines terminated by 'n';
-- Create Hive table with DP - order_month is DP.
CREATE TABLE orders (order_id int, order_date string, order_customer_id int, order_status string ) PARTITIONED BY (order_month string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'STORED AS TEXTFILE;
--Create staging table in Hive.
CREATE TABLE orders_stage (order_id int,order_date string, order_customer_id int, order_status string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE;
--Load data into staging table (Hive)
Load data into staging table load data local inpath
/tmp/orders_data.psv' overwrite into table orders_stage;
--Insert into Orders, which is final table (Hive).
Insert overwrite table retail_ods.orders partition (order_month)
select order_id, order_date, order_customer_id,order_status,
substr(order_date, 1, 7) order_month from retail_stage.orders_stage;
您可以在https://cwiki.apache.org/confluence/display/Hive/DynamicPartitions
找到更多详情