Hive抛出错误的标头检查错误

时间:2015-09-18 18:38:39

标签: hadoop hive

我正在开发一个hadoop集群用于评估目的,并且正在使用找到here的QWI示例。我在Hive中创建了我的表格:

CREATE EXTERNAL TABLE qwi2 (
periodicity varchar(256)  COMMENT 'Periodicity of report', 
seasonadj varchar(256)  COMMENT 'Seasonal Adjustment Indicator', 
geo_level varchar(256)  COMMENT 'Group: Geographic level of aggregation', 
geography varchar(256)  COMMENT 'Group: Geography code', 
ind_level varchar(256)  COMMENT 'Group: Industry level of aggregation', 
industry varchar(256)  COMMENT 'Group: Industry code', 
ownercode varchar(256)  COMMENT 'Group: Ownership group code', 
sex varchar(256)  COMMENT 'Group: Gender code', 
agegrp varchar(256)  COMMENT 'Group: Age group code (WIA)', 
race varchar(256)  COMMENT 'Group: race', 
ethnicity varchar(256)  COMMENT 'Group: ethnicity', 
education varchar(256)  COMMENT 'Group: education', 
firmage varchar(256)  COMMENT 'Group: Firm Age group', 
firmsize varchar(256)  COMMENT 'Group: Firm Size group', 
year int  COMMENT 'Time: Year', 
quarter int  COMMENT 'Time: Quarter', 
Emp int  COMMENT 'Employment: Counts', 
EmpEnd int  COMMENT 'Employment end-of-quarter: Counts', 
EmpS int  COMMENT 'Employment stable jobs: Counts', 
EmpTotal int  COMMENT 'Employment reference quarter: Counts', 
EmpSpv int  COMMENT 'Employment stable jobs - previous quarter: Counts', 
HirA int  COMMENT 'Hires All: Counts', 
HirN int  COMMENT 'Hires New: Counts', 
HirR int  COMMENT 'Hires Recalls: Counts', 
Sep int  COMMENT 'Separations: Counts', 
HirAEnd int  COMMENT 'End-of-quarter hires', 
SepBeg int  COMMENT 'Beginning-of-quarter separations', 
HirAEndRepl int  COMMENT 'Replacement hires', 
HirAEndR int  COMMENT 'End-of-quarter hiring rate', 
SepBegR int  COMMENT 'Beginning-of-quarter separation rate', 
HirAEndReplR int  COMMENT 'Replacement hiring rate', 
HirAS int  COMMENT 'Hires All stable jobs: Counts', 
HirNS int  COMMENT 'Hires New stable jobs: Counts', 
SepS int  COMMENT 'Separations stable jobs: Counts', 
SepSnx int  COMMENT 'Separations stable jobs - next quarter: Counts', 
TurnOvrS int  COMMENT 'Turnover stable jobs: Ratio', 
FrmJbGn int  COMMENT 'Firm Job Gains: Counts', 
FrmJbLs int  COMMENT 'Firm Job Loss: Counts', 
FrmJbC int  COMMENT 'Firm jobs change: Net Change', 
FrmJbGnS int  COMMENT 'Firm Gain stable jobs: Counts', 
FrmJbLsS int  COMMENT 'Firm Loss stable jobs: Counts', 
FrmJbCS int  COMMENT 'Firm stable jobs change: Net Change', 
EarnS int  COMMENT 'Employees stable jobs: Average monthly earnings', 
EarnBeg int  COMMENT 'Employees beginning-of-quarter : Average monthly earnings', 
EarnHirAS int  COMMENT 'Hires All stable jobs: Average monthly earnings', 
EarnHirNS int  COMMENT 'Hires New stable jobs: Average monthly earnings', 
EarnSepS int  COMMENT 'Separations stable jobs: Average monthly earnings', 
Payroll int  COMMENT 'Total quarterly payroll: Sum', 
sEmp int  COMMENT 'Status: Employment: Counts', 
sEmpEnd int  COMMENT 'Status: Employment end-of-quarter: Counts', 
sEmpS int  COMMENT 'Status: Employment stable jobs: Counts', 
sEmpTotal int  COMMENT 'Status: Employment reference quarter: Counts', 
sEmpSpv int  COMMENT 'Status: Employment stable jobs - previous quarter: Counts', 
sHirA int  COMMENT 'Status: Hires All: Counts', 
sHirN int  COMMENT 'Status: Hires New: Counts', 
sHirR int  COMMENT 'Status: Hires Recalls: Counts', 
sSep int  COMMENT 'Status: Separations: Counts', 
sHirAEnd int  COMMENT 'Status: End-of-quarter hires', 
sSepBeg int  COMMENT 'Status: Beginning-of-quarter separations', 
sHirAEndRepl int  COMMENT 'Status: Replacement hires', 
sHirAEndR int  COMMENT 'Status: End-of-quarter hiring rate', 
sSepBegR int  COMMENT 'Status: Beginning-of-quarter separation rate', 
sHirAEndReplR int  COMMENT 'Status: Replacement hiring rate', 
sHirAS int  COMMENT 'Status: Hires All stable jobs: Counts', 
sHirNS int  COMMENT 'Status: Hires New stable jobs: Counts', 
sSepS int  COMMENT 'Status: Separations stable jobs: Counts', 
sSepSnx int  COMMENT 'Status: Separations stable jobs - next quarter: Counts', 
sTurnOvrS int  COMMENT 'Status: Turnover stable jobs: Ratio', 
sFrmJbGn int  COMMENT 'Status: Firm Job Gains: Counts', 
sFrmJbLs int  COMMENT 'Status: Firm Job Loss: Counts', 
sFrmJbC int  COMMENT 'Status: Firm jobs change: Net Change', 
sFrmJbGnS int  COMMENT 'Status: Firm Gain stable jobs: Counts', 
sFrmJbLsS int  COMMENT 'Status: Firm Loss stable jobs: Counts', 
sFrmJbCS int  COMMENT 'Status: Firm stable jobs change: Net Change', 
sEarnS int  COMMENT 'Status: Employees stable jobs: Average monthly earnings', 
sEarnBeg int  COMMENT 'Status: Employees beginning-of-quarter : Average monthly earnings', 
sEarnHirAS int  COMMENT 'Status: Hires All stable jobs: Average monthly earnings', 
sEarnHirNS int  COMMENT 'Status: Hires New stable jobs: Average monthly earnings', 
sEarnSepS int  COMMENT 'Status: Separations stable jobs: Average monthly earnings', 
sPayroll int  COMMENT 'Status: Total quarterly payroll: Sum' 
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/user/lrichards/hive/censusqwi'
TBLPROPERTIES ('skip.header.line.count'='1');

我从Census下载服务器获取了一系列.gz文件。当我做一个简单的调用时:

SELECT *
FROM qw12
LIMIT 100;

我得到了预期的结果。

但是,当我在上面链接的URL中使用示例查询时:

SELECT Year, Avg(EarnS)
FROM    qwi2
GROUP BY Year
Order BY Year;

我收到以下错误:

INFO : Tez session hasn't been created yet. Opening session
INFO : 

INFO : Status: Running (Executing on YARN cluster with App id application_1442592050507_0011)

INFO : Map 1: -/-   Reducer 2: 0/6  Reducer 3: 0/1  
INFO : Map 1: 0/1   Reducer 2: 0/6  Reducer 3: 0/1  
INFO : Map 1: 0(+1)/1   Reducer 2: 0/6  Reducer 3: 0/1  
INFO : Map 1: 0(+1)/1   Reducer 2: 0/6  Reducer 3: 0/1  
INFO : Map 1: 0(+1)/1   Reducer 2: 0/6  Reducer 3: 0/1  
INFO : Map 1: 0(+1,-1)/1    Reducer 2: 0/6  Reducer 3: 0/1  
INFO : Map 1: 0(+1,-1)/1    Reducer 2: 0/6  Reducer 3: 0/1  
INFO : Map 1: 0(+1,-1)/1    Reducer 2: 0/6  Reducer 3: 0/1  
INFO : Map 1: 0(+1,-2)/1    Reducer 2: 0/6  Reducer 3: 0/1  
INFO : Map 1: 0(+1,-2)/1    Reducer 2: 0/6  Reducer 3: 0/1  
INFO : Map 1: 0(+1,-2)/1    Reducer 2: 0/6  Reducer 3: 0/1  
INFO : Map 1: 0(+1,-3)/1    Reducer 2: 0/6  Reducer 3: 0/1  
INFO : Map 1: 0(+1,-3)/1    Reducer 2: 0/6  Reducer 3: 0/1  
INFO : Map 1: 0(+1,-3)/1    Reducer 2: 0/6  Reducer 3: 0/1  
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1442592050507_0011_1_00, diagnostics=[Task failed, taskId=task_1442592050507_0011_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: incorrect header check
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:345)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: incorrect header check
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
... 14 more
Caused by: java.io.IOException: java.io.IOException: incorrect header check
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:137)
at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:61)
... 16 more
Caused by: java.io.IOException: incorrect header check
at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native Method)
at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:228)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:91)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:246)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
... 22 more
], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: incorrect header check
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:345)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: incorrect header check
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
... 14 more
Caused by: java.io.IOException: java.io.IOException: incorrect header check
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:137)
at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:61)
... 16 more
Caused by: java.io.IOException: incorrect header check
at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native Method)
at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:228)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:91)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:246)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
... 22 more
], TaskAttempt 2 failed, info=[Error: Failure while running task:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: incorrect header check
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:345)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: incorrect header check
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
... 14 more
Caused by: java.io.IOException: java.io.IOException: incorrect header check
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:137)
at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:61)
... 16 more
Caused by: java.io.IOException: incorrect header check
at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native Method)
at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:228)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:91)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:246)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
... 22 more
], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: incorrect header check
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:345)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: incorrect header check
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
... 14 more
Caused by: java.io.IOException: java.io.IOException: incorrect header check
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:137)
at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:61)
... 16 more
Caused by: java.io.IOException: incorrect header check
at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native Method)
at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:228)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:91)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:246)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
... 22 more
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1442592050507_0011_1_00 [Map 1] killed/failed due to:null]
ERROR : Vertex killed, vertexName=Reducer 3, vertexId=vertex_1442592050507_0011_1_02, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1, Vertex vertex_1442592050507_0011_1_02 [Reducer 3] killed/failed due to:null]
ERROR : Vertex killed, vertexName=Reducer 2, vertexId=vertex_1442592050507_0011_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:6, Vertex vertex_1442592050507_0011_1_01 [Reducer 2] killed/failed due to:null]
ERROR : DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:2

我已经使用7zip测试了文件,我也使用这些相同的文件进行放气并加载到SQL中,以便在hadoop和SQL之间进行比较测试。一个简单的SELECT可以正常工作但另一个查询却不行,这似乎很奇怪。我做错了什么。

2 个答案:

答案 0 :(得分:1)

我遇到了同样的错误,虽然我可以读取最初的几条记录,但算不了。记录失败并出现相同错误。

我只是通过将我的普通(未压缩)文件重命名为.txt来解决问题。以前我的文件名是var app = angular.module("app", []); app.controller("ctrl", function($scope) { var thedata = [ { id: "123456", category: "school", title: "first test" }, { id: "789012", category: "home", title: "second test" }, { id: "789012", category: ['home', 'school', 'primary', 'pre-primary', 'test', 'test1', 'test2' ], title: "third test" } ]; function overrideObjectValue(data) { angular.forEach(data , function(value, key){ if(typeof value.category === 'object') { if ($.inArray('school', value.category)) { data[key].category = "SchoolBC"; } else { data[key].category = "Not School"; } } else { if (value.category === "school") { data[key].category = "SchoolBC"; } else { data[key].category = "Not School"; } } }); return data; } $scope.alldata = overrideObjectValue(thedata); });;我将其重命名为<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script> <script src="https://ajax.googleapis.com/ajax/libs/angularjs/1.2.23/angular.min.js"></script> <div ng-app="app" ng-controller="ctrl"> <ul ng-repeat = "mydata in alldata"> <li>{{mydata.title}}<p>{{mydata.category}}</p></li> </ul> </div>。此外,如果您解压缩任何文件测试,您可以从中读取数据。

如果你想测试上面解释的运行计数记录数,它将完成扫描,它将准确地告诉你数据是否正确加载。

答案 1 :(得分:0)

主要发生的是数据损坏。第一个select语句懒得运行只返回100(它没有读到最后)。

快速验证是通过运行'从qw12选择计数(*)',这将进行表扫描。