总结这就是我做的事情:
原始数据 - >在HDFS中选择并保存过滤后的数据 - >使用HDFS中保存的文件创建外部表 - >使用外部表填充空表。
看看Exception,似乎这两个表之间有OUTPUT类型的东西
详情:
1)我有" table_log"包含大量数据的表(在数据库A中)具有以下结构(具有3个分区):
CREATE TABLE `table_log`(
`e_id` string,
`member_id` string,
.
.
PARTITIONED BY (
`dt` string,
`service_type` string,
`event_type` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u0001'
COLLECTION ITEMS TERMINATED BY '\u0002'
MAP KEYS TERMINATED BY '\u0003'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
2)我通过(td,service_type,event_type)过滤数据并将结果保存在HDFS中,如下所示:
INSERT OVERWRITE DIRECTORY '/user/atscale/filterd-ratlog' SELECT * FROM rat_log WHERE dt >= '2016-05-01' AND dt <='2016-05-31' AND service_type='xxxx_jp' AND event_type='vv';
3)然后我用上面的结果创建了一个外部表(table_log_filtered_ext)(在数据库B中)。 请注意,此表没有分区。
DROP TABLE IF EXISTS table_log_filtered_ext;
CREATE EXTERNAL TABLE `table_log_filtered_ext`(
`e_id` string,
`member_id` string,
.
.
dt string,
service_type string,
event_type string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u0001'
COLLECTION ITEMS TERMINATED BY '\u0002'
MAP KEYS TERMINATED BY '\u0003'
LOCATION '/user/atscale/filterd-ratlog'
4)我创建了另一个新表(table_log_filtered),类似于&#34; table_log&#34;结构(有3个分区):
CREATE TABLE `table_log_filtered` (
`e_id` string,
`member_id` string,
.
.
PARTITIONED BY (
`dt` string,
`service_type` string,
`event_type` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u0001'
COLLECTION ITEMS TERMINATED BY '\u0002'
MAP KEYS TERMINATED BY '\u0003'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
5)现在我想填充&#34; table_log_filtered&#34;来自外部表&#34; table_log_filtered_ext&#34;
的数据表(在&#34; table_log&#34;中有3个分区)SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.execution.engine=tez;
INSERT OVERWRITE TABLE rat_log_filtered PARTITION(dt, service_type, event_type)
SELECT * FROM table_log_filtered_ext;
但我得到了这个&#34; java.lang.ClassCastException。 查看异常,这两个表之间的OUTPUT类型有一些东西。 AnyTips?:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":
.
.
.
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370)
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292)
... 16 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:81)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:753)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
at org.apache.hadoop.hive.ql.exec.LimitOperator.process(LimitOperator.java:54)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361)
... 17 more
答案 0 :(得分:0)
万一其他人碰到这个问题,修复就像@Samson Scharfrichter提到的那样,我为table_log_filtered指定了STORED AS ORC
CREATE TABLE `table_log_filtered` (
`e_id` string,
`member_id` string,
.
.
PARTITIONED BY (
`dt` string,
`service_type` string,
`event_type` string)
STORED AS ORC