配置单元创建增量文件夹和存储桶文件

时间:2019-05-08 11:26:33

标签: hive hiveql

下面是我的配置单元插入查询,用于将数据从一个表复制到另一个表

INSERT INTO target_table PARTITION(eventdate) SELECT col1,col2,.... FROM source_table ;

目标表最初是空的。我的假设是,在复制数据HIVE时会在每个分区下创建一个文件,但是我的目录结构如下所示

hdfs dfs -ls /warehouse/tablespace/managed/hive/test.db/target_table/eventdate=2019-04-28/delta_0000001_0000001_0000
-rwxrwxrwx+  3 hive hadoop          1 2019-05-08 09:54 hdfs://SugarBOXNNHA/warehouse/tablespace/managed/hive/test.db/target_table/eventdate=2019-04-28/delta_0000001_0000001_0000/_orc_acid_version
-rwxrwxrwx+  3 hive hadoop       6071 2019-05-08 09:54 hdfs://SugarBOXNNHA/warehouse/tablespace/managed/hive/test.db/target_table/eventdate=2019-04-28/delta_0000001_0000001_0000/bucket_00002
-rwxrwxrwx+  3 hive hadoop       5194 2019-05-08 09:54 hdfs://SugarBOXNNHA/warehouse/tablespace/managed/hive/test.db/target_table/eventdate=2019-04-28/delta_0000001_0000001_0000/bucket_00007
-rwxrwxrwx+  3 hive hadoop       6606 2019-05-08 09:54 hdfs://SugarBOXNNHA/warehouse/tablespace/managed/hive/test.db/target_table/eventdate=2019-04-28/delta_0000001_0000001_0000/bucket_00008

表定义:

CREATE EXTERNAL TABLE IF NOT EXISTS source_table 
(col1 string, col2 string, col3 string,eventdate date) 
PARTITIONED BY (loaddate STRING) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
STORED AS ORC LOCATION '/warehouse/ext_hive_tables/source_table/data';

CREATE TABLE target_table(col1 string, col2 string, col3 string)  
PARTITIONED BY ( eventdate date) 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' STORED AS ORC;

为什么配置单元会创建“ delta_0000001_0000001_0000”文件夹并在该文件夹下创建多个名称为“ bucket_ *”的文件?我尚未在表中应用存储分区。有什么办法可以将所有内容复制到分区文件夹下的单个文件中

0 个答案:

没有答案