过滤结果为HBase时OutOfOrderScannerNextException

时间:2014-07-17 10:47:00

标签: java filter hbase

我试图以这种方式过滤HBase中的结果:

List<Filter> andFilterList = new ArrayList<>();
SingleColumnValueFilter sourceLowerFilter = new SingleColumnValueFilter(Bytes.toBytes("cf"), Bytes.toBytes("source"), CompareFilter.CompareOp.GREATER, Bytes.toBytes(lowerLimit));
sourceLowerFilter.setFilterIfMissing(true);
SingleColumnValueFilter sourceUpperFilter = new SingleColumnValueFilter(Bytes.toBytes("cf"), Bytes.toBytes("source"), CompareFilter.CompareOp.LESS_OR_EQUAL, Bytes.toBytes(upperLimit));
sourceUpperFilter.setFilterIfMissing(true);
SingleColumnValueFilter targetLowerFilter = new SingleColumnValueFilter(Bytes.toBytes("cf"), Bytes.toBytes("target"), CompareFilter.CompareOp.GREATER, Bytes.toBytes(lowerLimit));
targetLowerFilter.setFilterIfMissing(true);
SingleColumnValueFilter targetUpperFilter = new SingleColumnValueFilter(Bytes.toBytes("cf"), Bytes.toBytes("target"), CompareFilter.CompareOp.LESS_OR_EQUAL, Bytes.toBytes(upperLimit));
targetUpperFilter.setFilterIfMissing(true);

andFilterList.add(sourceUpperFilter);
andFilterList.add(targetUpperFilter);

FilterList andFilter = new FilterList(FilterList.Operator.MUST_PASS_ALL, andFilterList);

List<Filter> orFilterList = new ArrayList<>();
orFilterList.add(sourceLowerFilter);
orFilterList.add(targetLowerFilter);
FilterList orFilter = new FilterList(FilterList.Operator.MUST_PASS_ONE, orFilterList);

FilterList fl = new FilterList(FilterList.Operator.MUST_PASS_ALL);
fl.addFilter(andFilter);
fl.addFilter(orFilter);

Scan edgeScan = new Scan();
edgeScan.setFilter(fl);
ResultScanner edgeScanner = table.getScanner(edgeScan);
Result edgeResult;
logger.info("Writing edges...");
while ((edgeResult = edgeScanner.next()) != null) {
    // Some code
}

此代码启动此错误:

org.apache.hadoop.hbase.DoNotRetryIOException: Failed after retry of OutOfOrderScannerNextException: was there a rpc timeout?
    at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:402)
    at org.deustotech.internet.phd.framework.rdf2subdue.RDF2Subdue.writeFile(RDF2Subdue.java:150)
    at org.deustotech.internet.phd.framework.rdf2subdue.RDF2Subdue.run(RDF2Subdue.java:39)
    at org.deustotech.internet.phd.Main.main(Main.java:32)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:297)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 178 number_of_rows: 100 close_scanner: false next_call_seq: 0
    at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3098)
    at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)
    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
    at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:168)
    at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:39)
    at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:111)
    at java.lang.Thread.run(Thread.java:745)

    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
    at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
    at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:285)
    at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:204)
    at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:59)
    at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
    at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90)
    at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:354)
    ... 9 more
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException): org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 178 number_of_rows: 100 close_scanner: false next_call_seq: 0
    at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3098)
    at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)
    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
    at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:168)
    at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:39)
    at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:111)
    at java.lang.Thread.run(Thread.java:745)

    at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1453)
    at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1657)
    at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1715)
    at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:29900)
    at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:174)
    ... 13 more

RPC超时设置为600000.我试图删除一些过滤器给出这些结果:

  • sourceUpperFilter&amp;&amp; (sourceLowerFilter || targetLowerFilter) - &gt;成功
  • targetUpperFilter&amp;&amp; (sourceLowerFilter || targetLowerFilter) - &gt;成功
  • (sourceUpperFilter&amp;&amp; targetUpperFilter)&amp;&amp; (sourceLowerFilter) - &gt;失败
  • (sourceUpperFilter&amp;&amp; targetUpperFilter)&amp;&amp; (targetLowerFilter) - &gt;失败

任何帮助将不胜感激。谢谢。

2 个答案:

答案 0 :(得分:2)

我通过设置hbase.client.scanner.caching

解决了这个问题

see also

  

客户端和RS在扫描期间保持nextCallSeq编号。从客户端到服务器的每次next()调用都会在两端增加此数字。客户端将此号码与请求一起传递,在RS侧,传入的nextCallSeq和nextCallSeq都将匹配。如果超时,则不应发生客户端的此增量。如果在服务器端获取下一批数据,则nextCallSeq编号将不匹配。服务器将抛出OutOfOrderScannerNextException,然后客户端将使用startrow作为最后一个成功检索的行重新打开扫描程序。

     

由于问题是由客户端超时引起的,那么相应减少客户端缓存(hbase.client.scanner.caching)大小或增加rpc超时时间(hbase.rpc.timeout)就可以了。

希望这个答案有所帮助。

答案 1 :(得分:0)

原因:从大区域寻找几行。填充#rows需要时间 根据客户方的要求。到这时客户端获得rpc超时。 因此,客户端将在同一扫描仪上重试该呼叫。请记住下一步 呼叫客户端说你从你所在的位置接下来的N行。老失败了 呼叫正在进行中并且会推进一些行。所以这个重试电话 将错过那些行......为了避免这种情况并区分我们的这种情况 这扫描seqno和这个例外。看到这个,客户端将关闭 扫描仪并创建一个具有正确起始行的新扫描仪。但这种重试方式 再发生一次。这个电话也可能会超时。

所以我们 必须调整超时和/或扫描缓存值。
心跳机制避免长时间运行扫描的超时。

在我们使用的hbase中数据量很大的情况下 RPC超时= 1800000,租期= 1800000,我们使用fuzzy row filtersscan.setCaching(xxxx)// value need to be adjusted ;

注意:值过滤器比行过滤器

慢(因为全表扫描需要很长时间才能执行)

通过以上所有预防措施,我们成功地使用mapreduce从hbase查询大量数据。

希望这个解释有所帮助。