在具有附加列的新外部表中插入2个Hive外部表的数据

时间:2016-05-18 12:07:50

标签: hadoop hive hdfs external-tables

我有2个外部蜂巢表,如下所示。我使用sqoop从oracle填充了数据。

create external table transaction_usa
(
tran_id int,
acct_id int,
tran_date string,
amount double,
description string,
branch_code string,
tran_state string,
tran_city string,
speendby string,
tran_zip int
)
row format delimited
stored as textfile
location '/user/stg/bank_stg/tran_usa';

create external table transaction_canada
(
tran_id int,
acct_id int,
tran_date string,
amount double,
description string,
branch_code string,
tran_state string,
tran_city string,
speendby string,
tran_zip int
)
row format delimited
stored as textfile
location '/user/stg/bank_stg/tran_canada';

现在我想合并上面2个表数据,因为它在1个外部配置单元表中,所有字段与上面2个表中相同,但有1个额外列用于标识哪个数据来自哪个表。新的外部表,其中附加列为source_table。新的外部表格如下。

create external table transaction_usa_canada
(
tran_id int,
acct_id int,
tran_date string,
amount double,
description string,
branch_code string,
tran_state string,
tran_city string,
speendby string,
tran_zip int,
source_table string
)
row format delimited
stored as textfile
location '/user/gds/bank_ds/tran_usa_canada';

我该怎么办呢?

3 个答案:

答案 0 :(得分:1)

您从每个表中执行SELECT并对这些结果执行UNION ALL操作,最后将结果插入到第三个表中。

以下是最终的配置单元查询:

INSERT INTO TABLE transaction_usa_canada
SELECT tran_id, acct_id, tran_date, amount, description, branch_code, tran_state, tran_city, speendby, tran_zip, 'transaction_usa' AS source_table FROM transaction_usa
UNION ALL
SELECT tran_id, acct_id, tran_date, amount, description, branch_code, tran_state, tran_city, speendby, tran_zip, 'transaction_canada' AS source_table FROM transaction_canada;

希望这能帮到你!!!

答案 1 :(得分:0)

您也可以通过manual partitioning完成此操作。

CREATE TABLE transaction_new_table (
tran_id int,
acct_id int,
tran_date string,
amount double,
description string,
branch_code string,
tran_state string,
tran_city string,
speendby string,
tran_zip int
)
PARTITIONED BY (sourcetablename String)

然后在命令下面运行,

load data inpath 'hdfspath' into table transaction_new_table   partition(sourcetablename='1')

答案 2 :(得分:0)

您可以使用Hive的INSERT INTO子句

=5000000000-4294967295-1