下面是我的配置单元插入查询,用于将数据从一个表复制到另一个表
INSERT INTO target_table PARTITION(eventdate) SELECT col1,col2,.... FROM source_table ;
目标表最初是空的。我的假设是,在复制数据HIVE时会在每个分区下创建一个文件,但是我的目录结构如下所示
hdfs dfs -ls /warehouse/tablespace/managed/hive/test.db/target_table/eventdate=2019-04-28/delta_0000001_0000001_0000
-rwxrwxrwx+ 3 hive hadoop 1 2019-05-08 09:54 hdfs://SugarBOXNNHA/warehouse/tablespace/managed/hive/test.db/target_table/eventdate=2019-04-28/delta_0000001_0000001_0000/_orc_acid_version
-rwxrwxrwx+ 3 hive hadoop 6071 2019-05-08 09:54 hdfs://SugarBOXNNHA/warehouse/tablespace/managed/hive/test.db/target_table/eventdate=2019-04-28/delta_0000001_0000001_0000/bucket_00002
-rwxrwxrwx+ 3 hive hadoop 5194 2019-05-08 09:54 hdfs://SugarBOXNNHA/warehouse/tablespace/managed/hive/test.db/target_table/eventdate=2019-04-28/delta_0000001_0000001_0000/bucket_00007
-rwxrwxrwx+ 3 hive hadoop 6606 2019-05-08 09:54 hdfs://SugarBOXNNHA/warehouse/tablespace/managed/hive/test.db/target_table/eventdate=2019-04-28/delta_0000001_0000001_0000/bucket_00008
表定义:
CREATE EXTERNAL TABLE IF NOT EXISTS source_table
(col1 string, col2 string, col3 string,eventdate date)
PARTITIONED BY (loaddate STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS ORC LOCATION '/warehouse/ext_hive_tables/source_table/data';
CREATE TABLE target_table(col1 string, col2 string, col3 string)
PARTITIONED BY ( eventdate date)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',' STORED AS ORC;
为什么配置单元会创建“ delta_0000001_0000001_0000”文件夹并在该文件夹下创建多个名称为“ bucket_ *”的文件?我尚未在表中应用存储分区。有什么办法可以将所有内容复制到分区文件夹下的单个文件中