我正在开发一个hadoop集群用于评估目的,并且正在使用找到here的QWI示例。我在Hive中创建了我的表格:
CREATE EXTERNAL TABLE qwi2 (
periodicity varchar(256) COMMENT 'Periodicity of report',
seasonadj varchar(256) COMMENT 'Seasonal Adjustment Indicator',
geo_level varchar(256) COMMENT 'Group: Geographic level of aggregation',
geography varchar(256) COMMENT 'Group: Geography code',
ind_level varchar(256) COMMENT 'Group: Industry level of aggregation',
industry varchar(256) COMMENT 'Group: Industry code',
ownercode varchar(256) COMMENT 'Group: Ownership group code',
sex varchar(256) COMMENT 'Group: Gender code',
agegrp varchar(256) COMMENT 'Group: Age group code (WIA)',
race varchar(256) COMMENT 'Group: race',
ethnicity varchar(256) COMMENT 'Group: ethnicity',
education varchar(256) COMMENT 'Group: education',
firmage varchar(256) COMMENT 'Group: Firm Age group',
firmsize varchar(256) COMMENT 'Group: Firm Size group',
year int COMMENT 'Time: Year',
quarter int COMMENT 'Time: Quarter',
Emp int COMMENT 'Employment: Counts',
EmpEnd int COMMENT 'Employment end-of-quarter: Counts',
EmpS int COMMENT 'Employment stable jobs: Counts',
EmpTotal int COMMENT 'Employment reference quarter: Counts',
EmpSpv int COMMENT 'Employment stable jobs - previous quarter: Counts',
HirA int COMMENT 'Hires All: Counts',
HirN int COMMENT 'Hires New: Counts',
HirR int COMMENT 'Hires Recalls: Counts',
Sep int COMMENT 'Separations: Counts',
HirAEnd int COMMENT 'End-of-quarter hires',
SepBeg int COMMENT 'Beginning-of-quarter separations',
HirAEndRepl int COMMENT 'Replacement hires',
HirAEndR int COMMENT 'End-of-quarter hiring rate',
SepBegR int COMMENT 'Beginning-of-quarter separation rate',
HirAEndReplR int COMMENT 'Replacement hiring rate',
HirAS int COMMENT 'Hires All stable jobs: Counts',
HirNS int COMMENT 'Hires New stable jobs: Counts',
SepS int COMMENT 'Separations stable jobs: Counts',
SepSnx int COMMENT 'Separations stable jobs - next quarter: Counts',
TurnOvrS int COMMENT 'Turnover stable jobs: Ratio',
FrmJbGn int COMMENT 'Firm Job Gains: Counts',
FrmJbLs int COMMENT 'Firm Job Loss: Counts',
FrmJbC int COMMENT 'Firm jobs change: Net Change',
FrmJbGnS int COMMENT 'Firm Gain stable jobs: Counts',
FrmJbLsS int COMMENT 'Firm Loss stable jobs: Counts',
FrmJbCS int COMMENT 'Firm stable jobs change: Net Change',
EarnS int COMMENT 'Employees stable jobs: Average monthly earnings',
EarnBeg int COMMENT 'Employees beginning-of-quarter : Average monthly earnings',
EarnHirAS int COMMENT 'Hires All stable jobs: Average monthly earnings',
EarnHirNS int COMMENT 'Hires New stable jobs: Average monthly earnings',
EarnSepS int COMMENT 'Separations stable jobs: Average monthly earnings',
Payroll int COMMENT 'Total quarterly payroll: Sum',
sEmp int COMMENT 'Status: Employment: Counts',
sEmpEnd int COMMENT 'Status: Employment end-of-quarter: Counts',
sEmpS int COMMENT 'Status: Employment stable jobs: Counts',
sEmpTotal int COMMENT 'Status: Employment reference quarter: Counts',
sEmpSpv int COMMENT 'Status: Employment stable jobs - previous quarter: Counts',
sHirA int COMMENT 'Status: Hires All: Counts',
sHirN int COMMENT 'Status: Hires New: Counts',
sHirR int COMMENT 'Status: Hires Recalls: Counts',
sSep int COMMENT 'Status: Separations: Counts',
sHirAEnd int COMMENT 'Status: End-of-quarter hires',
sSepBeg int COMMENT 'Status: Beginning-of-quarter separations',
sHirAEndRepl int COMMENT 'Status: Replacement hires',
sHirAEndR int COMMENT 'Status: End-of-quarter hiring rate',
sSepBegR int COMMENT 'Status: Beginning-of-quarter separation rate',
sHirAEndReplR int COMMENT 'Status: Replacement hiring rate',
sHirAS int COMMENT 'Status: Hires All stable jobs: Counts',
sHirNS int COMMENT 'Status: Hires New stable jobs: Counts',
sSepS int COMMENT 'Status: Separations stable jobs: Counts',
sSepSnx int COMMENT 'Status: Separations stable jobs - next quarter: Counts',
sTurnOvrS int COMMENT 'Status: Turnover stable jobs: Ratio',
sFrmJbGn int COMMENT 'Status: Firm Job Gains: Counts',
sFrmJbLs int COMMENT 'Status: Firm Job Loss: Counts',
sFrmJbC int COMMENT 'Status: Firm jobs change: Net Change',
sFrmJbGnS int COMMENT 'Status: Firm Gain stable jobs: Counts',
sFrmJbLsS int COMMENT 'Status: Firm Loss stable jobs: Counts',
sFrmJbCS int COMMENT 'Status: Firm stable jobs change: Net Change',
sEarnS int COMMENT 'Status: Employees stable jobs: Average monthly earnings',
sEarnBeg int COMMENT 'Status: Employees beginning-of-quarter : Average monthly earnings',
sEarnHirAS int COMMENT 'Status: Hires All stable jobs: Average monthly earnings',
sEarnHirNS int COMMENT 'Status: Hires New stable jobs: Average monthly earnings',
sEarnSepS int COMMENT 'Status: Separations stable jobs: Average monthly earnings',
sPayroll int COMMENT 'Status: Total quarterly payroll: Sum'
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/user/lrichards/hive/censusqwi'
TBLPROPERTIES ('skip.header.line.count'='1');
我从Census下载服务器获取了一系列.gz文件。当我做一个简单的调用时:
SELECT *
FROM qw12
LIMIT 100;
我得到了预期的结果。
但是,当我在上面链接的URL中使用示例查询时:
SELECT Year, Avg(EarnS)
FROM qwi2
GROUP BY Year
Order BY Year;
我收到以下错误:
INFO : Tez session hasn't been created yet. Opening session
INFO :
INFO : Status: Running (Executing on YARN cluster with App id application_1442592050507_0011)
INFO : Map 1: -/- Reducer 2: 0/6 Reducer 3: 0/1
INFO : Map 1: 0/1 Reducer 2: 0/6 Reducer 3: 0/1
INFO : Map 1: 0(+1)/1 Reducer 2: 0/6 Reducer 3: 0/1
INFO : Map 1: 0(+1)/1 Reducer 2: 0/6 Reducer 3: 0/1
INFO : Map 1: 0(+1)/1 Reducer 2: 0/6 Reducer 3: 0/1
INFO : Map 1: 0(+1,-1)/1 Reducer 2: 0/6 Reducer 3: 0/1
INFO : Map 1: 0(+1,-1)/1 Reducer 2: 0/6 Reducer 3: 0/1
INFO : Map 1: 0(+1,-1)/1 Reducer 2: 0/6 Reducer 3: 0/1
INFO : Map 1: 0(+1,-2)/1 Reducer 2: 0/6 Reducer 3: 0/1
INFO : Map 1: 0(+1,-2)/1 Reducer 2: 0/6 Reducer 3: 0/1
INFO : Map 1: 0(+1,-2)/1 Reducer 2: 0/6 Reducer 3: 0/1
INFO : Map 1: 0(+1,-3)/1 Reducer 2: 0/6 Reducer 3: 0/1
INFO : Map 1: 0(+1,-3)/1 Reducer 2: 0/6 Reducer 3: 0/1
INFO : Map 1: 0(+1,-3)/1 Reducer 2: 0/6 Reducer 3: 0/1
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1442592050507_0011_1_00, diagnostics=[Task failed, taskId=task_1442592050507_0011_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: incorrect header check
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:345)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: incorrect header check
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
... 14 more
Caused by: java.io.IOException: java.io.IOException: incorrect header check
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:137)
at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:61)
... 16 more
Caused by: java.io.IOException: incorrect header check
at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native Method)
at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:228)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:91)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:246)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
... 22 more
], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: incorrect header check
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:345)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: incorrect header check
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
... 14 more
Caused by: java.io.IOException: java.io.IOException: incorrect header check
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:137)
at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:61)
... 16 more
Caused by: java.io.IOException: incorrect header check
at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native Method)
at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:228)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:91)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:246)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
... 22 more
], TaskAttempt 2 failed, info=[Error: Failure while running task:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: incorrect header check
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:345)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: incorrect header check
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
... 14 more
Caused by: java.io.IOException: java.io.IOException: incorrect header check
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:137)
at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:61)
... 16 more
Caused by: java.io.IOException: incorrect header check
at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native Method)
at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:228)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:91)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:246)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
... 22 more
], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: incorrect header check
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:345)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: incorrect header check
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
... 14 more
Caused by: java.io.IOException: java.io.IOException: incorrect header check
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:137)
at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:61)
... 16 more
Caused by: java.io.IOException: incorrect header check
at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native Method)
at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:228)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:91)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:246)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
... 22 more
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1442592050507_0011_1_00 [Map 1] killed/failed due to:null]
ERROR : Vertex killed, vertexName=Reducer 3, vertexId=vertex_1442592050507_0011_1_02, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1, Vertex vertex_1442592050507_0011_1_02 [Reducer 3] killed/failed due to:null]
ERROR : Vertex killed, vertexName=Reducer 2, vertexId=vertex_1442592050507_0011_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:6, Vertex vertex_1442592050507_0011_1_01 [Reducer 2] killed/failed due to:null]
ERROR : DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:2
我已经使用7zip测试了文件,我也使用这些相同的文件进行放气并加载到SQL中,以便在hadoop和SQL之间进行比较测试。一个简单的SELECT
可以正常工作但另一个查询却不行,这似乎很奇怪。我做错了什么。
答案 0 :(得分:1)
我遇到了同样的错误,虽然我可以读取最初的几条记录,但算不了。记录失败并出现相同错误。
我只是通过将我的普通(未压缩)文件重命名为.txt来解决问题。以前我的文件名是var app = angular.module("app", []);
app.controller("ctrl", function($scope) {
var thedata = [
{
id: "123456",
category: "school",
title: "first test"
},
{
id: "789012",
category: "home",
title: "second test"
},
{
id: "789012",
category: ['home', 'school', 'primary', 'pre-primary', 'test', 'test1', 'test2' ],
title: "third test"
}
];
function overrideObjectValue(data) {
angular.forEach(data , function(value, key){
if(typeof value.category === 'object') {
if ($.inArray('school', value.category)) {
data[key].category = "SchoolBC";
} else {
data[key].category = "Not School";
}
} else {
if (value.category === "school") {
data[key].category = "SchoolBC";
} else {
data[key].category = "Not School";
}
}
});
return data;
}
$scope.alldata = overrideObjectValue(thedata);
});
;我将其重命名为<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<script src="https://ajax.googleapis.com/ajax/libs/angularjs/1.2.23/angular.min.js"></script>
<div ng-app="app" ng-controller="ctrl">
<ul ng-repeat = "mydata in alldata">
<li>{{mydata.title}}<p>{{mydata.category}}</p></li>
</ul>
</div>
。此外,如果您解压缩任何文件测试,您可以从中读取数据。
如果你想测试上面解释的运行计数记录数,它将完成扫描,它将准确地告诉你数据是否正确加载。
答案 1 :(得分:0)
主要发生的是数据损坏。第一个select语句懒得运行只返回100(它没有读到最后)。
快速验证是通过运行'从qw12选择计数(*)',这将进行表扫描。