应用错误收集

我正在尝试使用新模式（1000多个列）使用插入从RC表（550多个列）到ORC表中复制数据（超过十亿条记录）。以下是使用的设置：

`-- configure hive.exec for insert.
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=1000;
set hive.exec.max.dynamic.partitions=9000;
set mapreduce.map.memory.mb=12000;
set mapreduce.map.java.opts=-Xmx10000m;
set hive.merge.mapfiles=false;

Hive command: INSERT OVERWRITE TABLE orc_table PARTITION (date,category) 
              SELECT (<list of columns and nulls>) FROM rc_table;
`

此Hive数据插入命令耗时将近13个小时。

有什么办法可以减少我花费的时间？我尝试过的几种选择：

一次将插入物分割成更少的分区。
玩了以上设置，但变化不大。

从RC表到ORC表的配置单元插入时间太长。

0 个答案: