Geomesa BBOX查询未返回所有结果

时间:2019-09-27 00:22:01

标签: hbase geomesa

我正在对OSM节点数据进行Geomesa(使用HBase)BBOX查询。我发现对于特定区域,geomesa无法返回边界框中的所有节点。

例如,我触发了3个查询:

  1. BBOX(-122.0,47.4,-122.01,47.5)-输出具有5,477个独特功能
  2. BBOX(-122.0,47.5,-122.01,47.6)-输出具有9,879个独特功能
  3. BBOX(-122.0,47.4,-122.01,47.6)-输出具有13,374个独特功能

查看这些边界框,我认为查询1 +查询2的功能应等于查询3。但实际上,它们并不相同。可悲的部分是Quer1和Query2的求和,其中一些元素在查询3数据本身中不存在。

下面是在开普勒上绘制后的图像。谁能帮助您了解问题所在以及如何找到问题的根本原因?

Missing points in Query 3

我在下面看到异常:

19/09/27 14:57:34 INFO RpcRetryingCaller: Call exception, tries=10, retries=35, started=38583 ms ago, cancelled=false, msg=java.io.FileNotFoundException: File not present on S3
    at com.amazon.ws.emr.hadoop.fs.s3.S3FSInputStream.read(S3FSInputStream.java:133)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
    at java.io.DataInputStream.read(DataInputStream.java:149)
    at org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:738)
    at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1493)
    at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1770)
    at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1596)
    at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:454)
    at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:269)
    at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:651)
    at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:601)
    at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:302)
    at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:201)
    at org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:391)
    at org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:224)
    at org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2208)
    at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.initializeScanners(HRegion.java:6112)
    at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:6086)
    at org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2841)
    at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2821)
    at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2803)
    at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2797)
    at org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:2697)
    at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3012)
    at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36613)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2380)
    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)

2 个答案:

答案 0 :(得分:1)

这看起来像是S3一致性问题。尝试运行:

emrfs sync -m <your DynamoDB catalog table> s3://<your bucket>/<your hbase root dir>

然后重新运行查询。 S3和用于管理HBase的S3一致性模型的DynamoDB表不同步是很常见的。作为cron作业运行此sync命令可以帮助避免此问题,或者在发生该问题时自动解决它。

答案 1 :(得分:0)

编辑:提供有关S3异常的其他信息,此建议不再适用。

我会尝试禁用“宽松边界框”,如here所述。如果这样不能解决差异问题,请在GeoMesa JIRA上提交错误报告,最好使用可重复的步骤。

谢谢