跨时间的多准则搜索

时间:2018-05-28 10:20:56

标签: mongodb search solr

我正在寻找一种优雅且可扩展的方式来搜索某些实体的元信息。

让我们对实体A和B进行以下元变化

[{
    "idEntity": "A",
    "name": "Name of A",
    "rating": 0.5,
    "description": "Some short description of A",
    "createdAtWeek": 1
}, {
    "idEntity": "B",
    "name": "Name of B",
    "rating": 0.2,
    "description": "Some short description of B",
    "createdAtWeek": 1
}, {
    "idEntity": "A",
    "name": "Name of A improved",
    "rating": 0.5,
    "description": "Some longer description of A",
    "createdAtWeek": 2
}, {
    "idEntity": "A",
    "name": "Name of A improved",
    "rating": 0.6,
    "description": "Some longer description of A",
    "createdAtWeek": 3
}]

我希望能够找到最后一个元素与rating >= 0.2匹配且在of中包含单词description的实体(唯一)。我也希望能够过去寻找这些相同的标准,但这需要给我不同的结果。

通过Mongo执行此操作的最简单方法是创建一个聚合管道,但如果集合变大,那就太慢了。

因此我去复制了所有文档,以便每周(1-3)获取完整数据,这样我就可以直接在查询中包含createdAtWeek,并确保我在不同时间内获得一致的结果。< / p>

但你可以看到这导致了什么,巨大的重复使得这个系列一无所获。

因此,我试图将这些文档存储在Solr中,但是在查看文档时,似乎没有办法首先按实体和日期对结果进行分组,然后在分组中进行搜索。

是否有其他方法可以实现与复制相同的结果而不实际复制?

1 个答案:

答案 0 :(得分:0)

Solr Block Join Query Parser可以处理这种操作。

数据结构向分层系统更改,createdAtWeek替换为validSince_ivalidUntil_i

/* Entity A */
{
  "path_s": "1.entity",
  "id": "A",
  "_childDocuments_": [
    {
      "path_s": "2.metadata.rating",
      "id": "2.metadata.rating.1",
      "_childDocuments_": [
        {
          "path_s": "3.metadata.rating.timeValidity",
          "id": "2.metadata.rating.timeValidity.1",
          "validSince_i": -1,
          "validUntil_i": 2,
          "value_f": 0.5
        },
        {
          "path_s": "3.metadata.rating.timeValidity",
          "id": "2.metadata.rating.timeValidity.2",
          "validSince_i": 3,
          "validUntil_i": 9999999,
          "value_f": 0.6
        }
      ]
    },
    {
      "path_s": "2.metadata.description",
      "id": "2.metadata.description.1",
      "_childDocuments_": [
        {
          "path_s": "3.metadata.description.timeValidity",
          "id": "2.metadata.description.timeValidity.1",
          "validSince_i": -1,
          "validUntil_i": 1,
          "value_txt_en": "Some short description of A"
        },
        {
          "path_s": "3.metadata.description.timeValidity",
          "id": "2.metadata.description.timeValidity.2",
          "validSince_i": 2,
          "validUntil_i": 9999999,
          "value_txt_en": "Some longer description of A"
        }
      ]
    }
  ]
}
/* Entity B */
{
  "path_s": "1.entity",
  "id": "B",
  "_childDocuments_": [
    {
      "path_s": "2.metadata.rating",
      "id": "2.metadata.rating.2",
      "_childDocuments_": [
        {
          "path_s": "3.metadata.rating.timeValidity",
          "id": "2.metadata.rating.timeValidity.3",
          "validSince_i": -1,
          "validUntil_i": 9999999,
          "value_f": 0.2
        }
      ]
    }
  ]
}

只要timeValidities不相互重叠,BlockJoin现在可以用于分面:

fq={!parent which="path_s:1.entity"}(path_s:3.metadata.rating.timeValidity AND validUntil_i:[2 TO *] AND value_f:[0.3 TO *])&fq={!parent which="path_s:1.entity"}(path_s:3.metadata.description.timeValidity AND validUntil_i:[2 TO *] AND value_txt_en:short)&q=*:*

由于没有第2周之后的实体,其rating >= 0.3的描述包含short

,因此不返回任何实体

运行以下工作也很好

fq={!parent which="path_s:1.entity"}(path_s:3.metadata.rating.timeValidity AND validUntil_i:[2 TO *] AND value_f:[0.3 TO *])&fq={!parent which="path_s:1.entity"}(path_s:3.metadata.description.timeValidity AND validUntil_i:[2 TO *] AND value_txt_en:longer)&q=*:*

正如您所见,实体A显示为在周后&gt; = 2时其评级为&gt; = 0.3且描述包含longer

需要声明性能方面,但它可以完成工作并避免重复。