Filter query on a Solr spatial field is returning all documents with a non-null value

时间:2019-03-17 22:35:23

标签: solr

Using Solr 7.7, I'm indexing simple rectangular polygons and I'm trying to filter them by an arbitrary bounding box, therefore I'm using the "range query syntax" trick that is documented on Solr's website.

Here is the definition of the field:

<fieldType name="my_geom"
   class="solr.SpatialRecursivePrefixTreeFieldType"
   spatialContextFactory="Geo3D"
   planetModel="WGS84"
   distanceUnits="kilometers"
   format="WKT"
   geo="true"
/>
<field name="*_geom" type="my_geom" indexed="true" stored="true" />

And here is the query:

/select?fq=spatial_geom:[55.0260828,-115.5085624 TO 55.02646,-115.507337]&q=*:*

I'm expecting to only get results that fall within the defined bounding box but I'm actually getting ALL documents that have a non-null value in the "spatial_geom" field. Here is an example of a document that I'm not expecting to get (but I am):

{
    "spatial_geom":"POLYGON((-118.080201721669 54.5864541583249,-118.080201721669 54.5865258517606,-118.080078279314 54.5865258517606,-118.080078279314 54.5864541583249,-118.080201721669 54.5864541583249))",
    ...[other fields redacted]
}

Edit 1: Upgraded to Solr 8.0.0 and still encountering the same problem. Given that I'm getting all documents (with a non-null value) I suspect that I'm doing something fundamentally wrong, I just can't see it.

Edit 2: For the sake of using simpler numbers to double check the data isn't bad, I loaded all of my documents with fake polygons using the following WKT POLYGON((10 10,10 20,20 20,20 10,10 10)) and then queried using ?fq=spatial_geom:[30,30 TO 40,40] and it still returned ALL documents!

3 个答案:

答案 0 :(得分:1)

Geo3D要求多边形遵守“右手法则”,因此外圈必须按逆时针顺序排列,孔必须按顺时针排列。如果您犯了此错误,则形状的含义会颠倒,因此加拿大艾伯塔省的那个小矩形代表该位置的倒数。因此,大多数形状几乎可以覆盖整个地球!为此,Solr中肯定存在文档问题。直到我今天调试完,我才知道!看来有些GIS行业也正在迁移到此规则:http://mapster.me/right-hand-rule-geojson-fixer/

另外:我很好奇看到Geo3D在工作后如何与JTS进行比较。此外,您可能应该使用solr.RptWithGeometrySpatialField而不是solr.SpatialRecursivePrefixTreeFieldType来获取矢量几何的完整精度,而不是使用形状的网格表示,否则,您的查询可能会因接近索引形状而得到假阳性。 。尝试尝试的另一种方法是使用prefixTree =“ s2”,这是尚未记录的prefixTree,据称它特别适用于Geo3D。

答案 1 :(得分:0)

尽管我不确定为什么这样做是必要的,但它显着减慢了数据导入时间,

这是新的字段类型定义:

<fieldType name="my_geom"
  class="solr.SpatialRecursivePrefixTreeFieldType"
  spatialContextFactory="JTS"
  autoIndex="true"
  distanceUnits="kilometers"
  format="WKT"
  geo="true"                                                                              
  />

我还根据Solr的文档下载了JTS jar。

但是考虑到Solr文档,我仍然无法解释为什么要使用JTS ,这使我相信我应该能够使用Geo3D为多边形建立索引:

  

Geo3D是Solr随附的Lucene空间3d模块的俗称。这是一个计算几何库,可在球体或WGS84椭球上实现各种形状(包括多边形)。

在导入或查询过程中使用Geo3D不会给我任何错误,只是没有给我预期的结果。奇怪...

答案 2 :(得分:0)

我还使用了Solr的BBoxField类型(和JTS)。请参阅字段类型定义:

<fieldType name="my_bbox"
  class="solr.BBoxField"
  spatialContextFactory="JTS"
  format="WKT"                                                                      
  geo="true"
  distanceUnits="kilometers"
  numberType="pdouble"
  />

我仍然不清楚为什么将SpatialRecursivePrefixTreeFieldTypeGeo3D一起使用不能提供正确的结果。

数据导入速度比SpatialRecursivePrefixTreeFieldTypeJTS结合使用要快得多。