hive多表连接具有相同的条件错误

时间:2017-01-22 10:09:30

标签: hadoop hive

我正在运行几个脚本,我不断收到同样的错误。所有这些都是具有相同条件的多表连接。

数据存储为镶木地板。

Hive版本1.2.1 / MR

SELECT count(*) 
FROM   xxx.tmp_usr_1 m
INNER JOIN xxx.tmp_usr n
ON m.date_id = n.date_id AND m.end_user_id = n.end_user_id
LEFT JOIN xxx.usr_2 p
ON m.date_id = p.date_id AND m.end_user_id = p.end_user_id;

以下是错误消息:

  

2017-01-22 16:47:55,208 Stage-1 map = 54%,reduce = 0%,累计CPU 560.81秒       2017-01-22 16:47:56,248 Stage-1 map = 58%,reduce = 0%,累计CPU 577.74秒       2017-01-22 16:47:57,290 Stage-1 map = 100%,reduce = 100%,累计CPU 446.32秒MapReduce   总累计CPU时间:7分26秒320毫秒结束作业= job_1484710871657_6350有错误作业期间出错,获取调试信息...检查任务ID:task_1484710871657_6350_m_000061(以及更多)来自作业job_1484710871657_6350检查任务ID:task_1484710871657_6350_m_000069(以及更多)来自作业job_1484710871657_6350检查任务ID:task_1484710871657_6350_m_000053(及以上),从工作job_1484710871657_6350检查任务ID:task_1484710871657_6350_m_000011(及以上),从工作job_1484710871657_6350检查任务ID:task_1484710871657_6350_m_000063(及以上),从工作job_1484710871657_6350检查任务ID:从工作task_1484710871657_6350_m_000049(及以上)job_1484710871657_6350检查任务ID:task_1484710871657_6350_m_000052(以及更多)来自职位job_1484710871657_6350   失败最多的任务(4):       -----任务ID:task_1484710871657_6350_m_000071   网址:http://xxxxxxxxxx/taskdetails.jsp?jobid=job_1484710871657_6350&tipid=task_1484710871657_6350_m_000071       -----此任务的诊断消息:错误:java.io.IOException:java.lang.reflect.InvocationTargetException               at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)               at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)               在org.apache.hadoop.hive.shims.HadoopShimsSecure $ CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:266)               在org.apache.hadoop.hive.shims.HadoopShimsSecure $ CombineFileRecordReader。(HadoopShimsSecure.java:213)               在org.apache.hadoop.hive.shims.HadoopShimsSecure $ CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:333)               在org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:719)               在org.apache.hadoop.mapred.MapTask $ TrackedRecordReader。(MapTask.java:169)               在org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)               在org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)               在org.apache.hadoop.mapred.YarnChild $ 2.run(YarnChild.java:163)               at java.security.AccessController.doPrivileged(Native Method)               在javax.security.auth.Subject.doAs(Subject.java:422)               at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)               在org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)引起:java.lang.reflect.InvocationTargetException               at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)               at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)               at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)               at java.lang.reflect.Constructor.newInstance(Constructor.java:422)               在org.apache.hadoop.hive.shims.HadoopShimsSecure $ CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:252)               ... 11更多引起:java.lang.IllegalStateException:找到无效的架构数据类型:PRIMITIVE,expected:STRUCT               at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getProjectedGroupFields(DataWritableReadSupport.java:118)               at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getSchemaByName(DataWritableReadSupport.java:156)               在org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:222)               在org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:256)               在org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper。(ParquetRecordReaderWrapper.java:99)               在org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper。(ParquetRecordReaderWrapper.java:85)               at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72)               在org.apache.hadoop.hive.ql.io.CombineHiveRecordReader。(CombineHiveRecordReader.java:67)               ......还有16个   ApplicationMaster杀死的容器。根据要求杀死容器。退出代码为143 Container退出,退出代码为非零143

我的数据包含大约20M条记录。当我尝试使用一列(end_user_id)连接表时,我得到了同样的错误。

连接列是相同的数据类型。将B连接作为子查询,然后加入C可以解决此问题。

我们有许多具有相同条件的多表连接语句的SQL查询,但只有少数SQL脚本遇到这些错误。

1 个答案:

答案 0 :(得分:2)

确保所有表的匹配列数据类型相同