读取期间DSE Hadoop间歇性超时错误

时间:2014-10-24 19:47:29

标签: hadoop datastax-enterprise datastax

我有几个星期前开始发生的奇怪错误。我们必须更换几个分析节点,并且hive调用的hadoop作业都不能完成。它们在不同的阶段崩溃,出现类似的错误:

com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: ip-x-x-x-x.ec2.internal/x.x.x.x:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout during read))
    at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
    at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256)
    at com.datastax.driver.core.ArrayBackedResultSet$MultiPage.prepareNextRow(ArrayBackedResultSet.java:259)
    at com.datastax.driver.core.ArrayBackedResultSet$MultiPage.isExhausted(ArrayBackedResultSet.java:222)
    at com.datastax.driver.core.ArrayBackedResultSet$1.hasNext(ArrayBackedResultSet.java:115)
    at org.apache.cassandra.hadoop.cql3.CqlRecordReader$RowIterator.computeNext(CqlRecordReader.java:239)
    at org.apache.cassandra.hadoop.cql3.CqlRecordReader$RowIterator.computeNext(CqlRecordReader.java:218)
    at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
    at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
    at org.apache.cassandra.hadoop.cql3.CqlRecordReader.getProgress(CqlRecordReader.java:152)
    at org.apache.hadoop.hive.cassandra.cql3.input.CqlHiveRecordReader.getProgress(CqlHiveRecordReader.java:62)
    at org.apache.hadoop.hive.ql.io.HiveRecordReader.getProgress(HiveRecordReader.java:71)
    at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.getProgress(MapTask.java:260)
    at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:233)
    at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:216)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.hadoop.mapred.Child.main(Child.java:260)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: ip-x-x-x-x.ec2.internal/x.x.x.x:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout during read))
    at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
    at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:175)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

我打开了调试日志记录,但仍无法找到当时发生的任何事情。

谢谢!

1 个答案:

答案 0 :(得分:0)

实际上,问题在于应用程序将大量数据写入其中一个地图列。它恰好与集群更新一致。蜂巢工作刚刚挂起,有一个误导性的错误信息。通过一些反复试验,我能够将问题缩小到该地图列并删除有问题的数据。