java.io.IOException:运行MapReduce程序时无法获取输入拆分

时间:2014-04-22 12:12:49

标签: java hadoop cassandra datastax-enterprise

我正在运行MapReduce程序并遇到以下错误。

14/04/22 07:44:02 INFO mapred.JobClient: Cleaning up the staging area cfs://XX.XXX.XXX.XXX/tmp/hadoop-cassandra/mapred/staging/psadmin/.staging/job_201404180932_0063
14/04/22 07:44:02 ERROR security.UserGroupInformation: PriviledgedActionException as:psadmin cause:java.io.IOException: Could not get input splits
Exception in thread "main" java.io.IOException: Could not get input splits
        at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getSplits(AbstractColumnFamilyInputFormat.java:193)
        at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962)
        at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979)
        at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Unknown Source)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
        at MultiOutMR.run(MultiOutMR.java:95)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at MultiOutMR.main(MultiOutMR.java:36)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.thrift.transport.TTransportException
        at java.util.concurrent.FutureTask.report(Unknown Source)
        at java.util.concurrent.FutureTask.get(Unknown Source)
        at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getSplits(AbstractColumnFamilyInputFormat.java:189)
        ... 19 more
Caused by: java.lang.RuntimeException: org.apache.thrift.transport.TTransportException
        at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getSubSplits(AbstractColumnFamilyInputFormat.java:304)
        at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.access$200(AbstractColumnFamilyInputFormat.java:60)
        at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat$SplitCallable.call(AbstractColumnFamilyInputFormat.java:226)
        at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat$SplitCallable.call(AbstractColumnFamilyInputFormat.java:211)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.thrift.transport.TTransportException
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
        at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
        at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
        at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
        at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
        at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
        at org.apache.cassandra.thrift.Cassandra$Client.recv_describe_splits_ex(Cassandra.java:1359)
        at org.apache.cassandra.thrift.Cassandra$Client.describe_splits_ex(Cassandra.java:1343)
        at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getSubSplits(AbstractColumnFamilyInputFormat.java:281)
        ... 7 more

注意: -
使用的先决条件: -
Datastax Enterprise(DSE 3.2.5)与Cassandra 1.2.15.1和Hadoop 1.0.4.9
我们配置了一个包含4个节点的数据中心。 nodetool状态显示如下:

XXXXXX@XXXXXXXXX:~$ nodetool status
Datacenter: XXXXXX

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Owns   Host ID                               Token                                    Rack
UN  XX.XXX.XXX.XXX  14.65 MB   25.0%  XX.XXX.XXX.XXX vm01
UN  XX.XXX.XXX.XXX  34.25 MB   25.0%  XX.XXX.XXX.XXX vm01
UN  XX.XXX.XXX.XXX  57.45 MB   25.0%  XX.XXX.XXX.XXX vm01
UN  XX.XXX.XXX.XXX  57.08 MB   25.0%  XX.XXX.XXX.XXX vm01

有人可以帮助解决这个问题吗?提前谢谢。

1 个答案:

答案 0 :(得分:0)

您需要提供有关如何设置hadoop作业的更多信息。这更多的是配置问题。 TTransportException更像是服务器内部问题。