我正在寻找一种优雅且可扩展的方式来搜索某些实体的元信息。
让我们对实体A和B进行以下元变化
[{
"idEntity": "A",
"name": "Name of A",
"rating": 0.5,
"description": "Some short description of A",
"createdAtWeek": 1
}, {
"idEntity": "B",
"name": "Name of B",
"rating": 0.2,
"description": "Some short description of B",
"createdAtWeek": 1
}, {
"idEntity": "A",
"name": "Name of A improved",
"rating": 0.5,
"description": "Some longer description of A",
"createdAtWeek": 2
}, {
"idEntity": "A",
"name": "Name of A improved",
"rating": 0.6,
"description": "Some longer description of A",
"createdAtWeek": 3
}]
我希望能够找到最后一个元素与rating >= 0.2
匹配且在of
中包含单词description
的实体(唯一)。我也希望能够过去寻找这些相同的标准,但这需要给我不同的结果。
通过Mongo执行此操作的最简单方法是创建一个聚合管道,但如果集合变大,那就太慢了。
因此我去复制了所有文档,以便每周(1-3)获取完整数据,这样我就可以直接在查询中包含createdAtWeek
,并确保我在不同时间内获得一致的结果。< / p>
但你可以看到这导致了什么,巨大的重复使得这个系列一无所获。
因此,我试图将这些文档存储在Solr中,但是在查看文档时,似乎没有办法首先按实体和日期对结果进行分组,然后在分组中进行搜索。
是否有其他方法可以实现与复制相同的结果而不实际复制?
答案 0 :(得分:0)
Solr Block Join Query Parser可以处理这种操作。
数据结构向分层系统更改,createdAtWeek
替换为validSince_i
和validUntil_i
。
/* Entity A */
{
"path_s": "1.entity",
"id": "A",
"_childDocuments_": [
{
"path_s": "2.metadata.rating",
"id": "2.metadata.rating.1",
"_childDocuments_": [
{
"path_s": "3.metadata.rating.timeValidity",
"id": "2.metadata.rating.timeValidity.1",
"validSince_i": -1,
"validUntil_i": 2,
"value_f": 0.5
},
{
"path_s": "3.metadata.rating.timeValidity",
"id": "2.metadata.rating.timeValidity.2",
"validSince_i": 3,
"validUntil_i": 9999999,
"value_f": 0.6
}
]
},
{
"path_s": "2.metadata.description",
"id": "2.metadata.description.1",
"_childDocuments_": [
{
"path_s": "3.metadata.description.timeValidity",
"id": "2.metadata.description.timeValidity.1",
"validSince_i": -1,
"validUntil_i": 1,
"value_txt_en": "Some short description of A"
},
{
"path_s": "3.metadata.description.timeValidity",
"id": "2.metadata.description.timeValidity.2",
"validSince_i": 2,
"validUntil_i": 9999999,
"value_txt_en": "Some longer description of A"
}
]
}
]
}
/* Entity B */
{
"path_s": "1.entity",
"id": "B",
"_childDocuments_": [
{
"path_s": "2.metadata.rating",
"id": "2.metadata.rating.2",
"_childDocuments_": [
{
"path_s": "3.metadata.rating.timeValidity",
"id": "2.metadata.rating.timeValidity.3",
"validSince_i": -1,
"validUntil_i": 9999999,
"value_f": 0.2
}
]
}
]
}
只要timeValidities不相互重叠,BlockJoin现在可以用于分面:
fq={!parent which="path_s:1.entity"}(path_s:3.metadata.rating.timeValidity AND validUntil_i:[2 TO *] AND value_f:[0.3 TO *])&fq={!parent which="path_s:1.entity"}(path_s:3.metadata.description.timeValidity AND validUntil_i:[2 TO *] AND value_txt_en:short)&q=*:*
由于没有第2周之后的实体,其rating >= 0.3
的描述包含short
运行以下工作也很好
fq={!parent which="path_s:1.entity"}(path_s:3.metadata.rating.timeValidity AND validUntil_i:[2 TO *] AND value_f:[0.3 TO *])&fq={!parent which="path_s:1.entity"}(path_s:3.metadata.description.timeValidity AND validUntil_i:[2 TO *] AND value_txt_en:longer)&q=*:*
正如您所见,实体A显示为在周后&gt; = 2时其评级为&gt; = 0.3且描述包含longer
。
需要声明性能方面,但它可以完成工作并避免重复。