重复的蜂巢表比原来的大得多

时间:2017-10-25 17:10:59

标签: sql hive create-table

我有一个表table1,然后我用“create table table2 as select * from table1 where partition_key is not null;”复制它。 table1463.2 GB,但table2原来是2.8 TB。为什么会这样?

PS:我刚刚显示了分区,看来table1和table2的分区方式不同。所以我添加了我的问题:如何复制表并保留其原始分区信息?

table1:hdfs dfs -du -s -h /user/hive/warehouse/map_services.db/userhistory1/*

7.9 G  23.7 G  /user/hive/warehouse/map_services.db/userhistory/datestr=1970-01-01
25.7 G  77.1 G  /user/hive/warehouse/map_services.db/userhistory/datestr=2017-10-01
18.8 G  56.3 G  /user/hive/warehouse/map_services.db/userhistory/datestr=2017-10-02
16.8 G  50.5 G  /user/hive/warehouse/map_services.db/userhistory/datestr=2017-10-03
17.5 G  52.5 G  /user/hive/warehouse/map_services.db/userhistory/datestr=2017-10-04
18.0 G  53.9 G  /user/hive/warehouse/map_services.db/userhistory/datestr=2017-10-05
22.4 G  67.1 G  /user/hive/warehouse/map_services.db/userhistory/datestr=2017-10-06
27.3 G  81.8 G  /user/hive/warehouse/map_services.db/userhistory/datestr=2017-10-07

表2:hdfs dfs -du -s -h /user/hive/warehouse/map_services.db/userhistory2/*

929.2 M  2.7 G  /user/hive/warehouse/map_services.db/userhistory2/000000_0
651.1 M  1.9 G  /user/hive/warehouse/map_services.db/userhistory2/000001_0
1.1 G  3.3 G  /user/hive/warehouse/map_services.db/userhistory2/000002_0
1.1 G  3.3 G  /user/hive/warehouse/map_services.db/userhistory2/000003_0
1.6 G  4.7 G  /user/hive/warehouse/map_services.db/userhistory2/000004_0
1.3 G  4.0 G  /user/hive/warehouse/map_services.db/userhistory2/000005_0
1.2 G  3.7 G  /user/hive/warehouse/map_services.db/userhistory2/000006_0
1.5 G  4.5 G  /user/hive/warehouse/map_services.db/userhistory2/000007_0
1.5 G  4.4 G  /user/hive/warehouse/map_services.db/userhistory2/000008_0
1.5 G  4.4 G  /user/hive/warehouse/map_services.db/userhistory2/000009_0
1.5 G  4.5 G  /user/hive/warehouse/map_services.db/userhistory2/000010_0
1.4 G  4.3 G  /user/hive/warehouse/map_services.db/userhistory2/000011_0
1.4 G  4.3 G  /user/hive/warehouse/map_services.db/userhistory2/000012_0
1.3 G  3.8 G  /user/hive/warehouse/map_services.db/userhistory2/000013_0
1.5 G  4.4 G  /user/hive/warehouse/map_services.db/userhistory2/000014_0
1.4 G  4.2 G  /user/hive/warehouse/map_services.db/userhistory2/000015_0
1.2 G  3.6 G  /user/hive/warehouse/map_services.db/userhistory2/000016_0
1.5 G  4.5 G  /user/hive/warehouse/map_services.db/userhistory2/000017_0
1.5 G  4.4 G  /user/hive/warehouse/map_services.db/userhistory2/000018_0
1.4 G  4.2 G  /user/hive/warehouse/map_services.db/userhistory2/000019_0
1.5 G  4.6 G  /user/hive/warehouse/map_services.db/userhistory2/000020_0
1.5 G  4.5 G  /user/hive/warehouse/map_services.db/userhistory2/000021_0
1.6 G  4.7 G  /user/hive/warehouse/map_services.db/userhistory2/000022_0
1.3 G  4.0 G  /user/hive/warehouse/map_services.db/userhistory2/000023_0
1.1 G  3.4 G  /user/hive/warehouse/map_services.db/userhistory2/000024_0
908.7 M  2.7 G  /user/hive/warehouse/map_services.db/userhistory2/000025_0
1.4 G  4.2 G  /user/hive/warehouse/map_services.db/userhistory2/000026_0
1.4 G  4.3 G  /user/hive/warehouse/map_services.db/userhistory2/000027_0
1.3 G  3.8 G  /user/hive/warehouse/map_services.db/userhistory2/000028_0
1.4 G  4.1 G  /user/hive/warehouse/map_services.db/userhistory2/000029_0
1.6 G  4.7 G  /user/hive/warehouse/map_services.db/userhistory2/000030_0
1.3 G  4.0 G  /user/hive/warehouse/map_services.db/userhistory2/000031_0
1.3 G  4.0 G  /user/hive/warehouse/map_services.db/userhistory2/000032_0
1.6 G  4.8 G  /user/hive/warehouse/map_services.db/userhistory2/000033_0
1.5 G  4.4 G  /user/hive/warehouse/map_services.db/userhistory2/000034_0
1.3 G  3.8 G  /user/hive/warehouse/map_services.db/userhistory2/000035_0
940.0 M  2.8 G  /user/hive/warehouse/map_services.db/userhistory2/000036_0
1.3 G  4.0 G  /user/hive/warehouse/map_services.db/userhistory2/000037_0
1.2 G  3.6 G  /user/hive/warehouse/map_services.db/userhistory2/000038_0
1.5 G  4.6 G  /user/hive/warehouse/map_services.db/userhistory2/000039_0
1.2 G  3.7 G  /user/hive/warehouse/map_services.db/userhistory2/000040_0
1.1 G  3.4 G  /user/hive/warehouse/map_services.db/userhistory2/000041_0
1.1 G  3.4 G  /user/hive/warehouse/map_services.db/userhistory2/000042_0
1.0 G  3.1 G  /user/hive/warehouse/map_services.db/userhistory2/000043_0
1.4 G  4.3 G  /user/hive/warehouse/map_services.db/userhistory2/000044_0
1.3 G  4.0 G  /user/hive/warehouse/map_services.db/userhistory2/000045_0
1.4 G  4.1 G  /user/hive/warehouse/map_services.db/userhistory2/000046_0
1.5 G  4.5 G  /user/hive/warehouse/map_services.db/userhistory2/000047_0
1.1 G  3.3 G  /user/hive/warehouse/map_services.db/userhistory2/000048_0
706.3 M  2.1 G  /user/hive/warehouse/map_services.db/userhistory2/000049_0
1.4 G  4.2 G  /user/hive/warehouse/map_services.db/userhistory2/000050_0
1.5 G  4.6 G  /user/hive/warehouse/map_services.db/userhistory2/000051_0
872.2 M  2.6 G  /user/hive/warehouse/map_services.db/userhistory2/000052_0
1.2 G  3.5 G  /user/hive/warehouse/map_services.db/userhistory2/000053_0
1.2 G  3.7 G  /user/hive/warehouse/map_services.db/userhistory2/000054_0
943.9 M  2.8 G  /user/hive/warehouse/map_services.db/userhistory2/000055_0
1.6 G  4.7 G  /user/hive/warehouse/map_services.db/userhistory2/000056_0
1.5 G  4.4 G  /user/hive/warehouse/map_services.db/userhistory2/000057_0
1.3 G  4.0 G  /user/hive/warehouse/map_services.db/userhistory2/000058_0
1.4 G  4.3 G  /user/hive/warehouse/map_services.db/userhistory2/000059_0
961.5 M  2.8 G  /user/hive/warehouse/map_services.db/userhistory2/000060_0
1.3 G  3.8 G  /user/hive/warehouse/map_services.db/userhistory2/000061_0
1.4 G  4.3 G  /user/hive/warehouse/map_services.db/userhistory2/000062_0
1.4 G  4.2 G  /user/hive/warehouse/map_services.db/userhistory2/000063_0
1.4 G  4.1 G  /user/hive/warehouse/map_services.db/userhistory2/000064_0
924.4 M  2.7 G  /user/hive/warehouse/map_services.db/userhistory2/000065_0

1 个答案:

答案 0 :(得分:1)

您的目标表未压缩且未分区。

要使用相同的分区创建表,请使用以下命令:

create table 2 like table1;

在插入前启用压缩:

SET hive.exec.compress.output=true;

插入覆盖动态分区:

set hive.exec.dynamic.partition=true;  
set hive.exec.dynamic.partition.mode=nonstrict; 

insert overwrite table2 partition(partition_key)
select * from table1;