我正在尝试让压缩工作。
原始表定义为:
create external table orig_table (col1 String ...... coln String)
.
.
.
partitioned by (pdate string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES ( "separatorChar" = "|")
STORED AS TEXTFILE location '/user/path/to/table/';
表orig_table有大约10个分区,每个分区有100行
为了压缩它,我创建了一个类似的表,只有从TEXTFILE到ORCFILE的修改
create external table orig_table_orc (col1 String ...... coln String)
.
.
.
partitioned by (pdate string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES ( "separatorChar" = "|")
STORED AS ORCFILE location '/user/path/to/table/';
尝试通过以下方式复制记录:
set hive.exec.dynamic.partition.mode=nonstrict;
set mapred.output.compress=true;
set mapred.output.compression.codec=org.apache.hadoop.io.compress.LzoCodec;
[have tried with other codecs as well, with same error]
set mapred.output.compression.type=RECORD;
insert overwrite table zip_test.orig_table_orc partition(pdate) select * from default.orgi_table;
我得到的错误是:
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"col1":value ... "coln":value}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176)
... 8 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:81)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:689)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
... 9 more
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 3 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
如果我将hive表作为SEQUENCEFILE - 而不是使用ORC,任何解决方法?我已经看到了几个具有相同错误但在Java程序中而不是Hive QL的问题
答案 0 :(得分:4)
Gaah! ORC和CSV不一样!!!
解释你做错了会花费几个小时和很多关于Hadoop和关于数据库技术的书摘,所以简短的回答是:对于柱状
create table orig_table_orc
(col1 String ...... coln String)
partitioned by (pdate string)
stored as Orc
location '/where/ever/you/want'
TblProperties ("orc.compress"="ZLIB")