我似乎正在经历一些不一致的方面计数,我想知道为什么两者之间存在差异。我在下面运行了两个查询,您可以看到至少有一个术语的计数略有不同(参见术语21到底部)948 vs 1035是差异。最底层的第43项也有一个增量。
查询#1:
{'facets': {'primary_country_id': {'terms': {'field': 'primary_country_id', 'size': '20'}}}}
查询#1:
{'facets': {'primary_country_id': {'terms': {'field': 'primary_country_id', 'size': '30'}}}}
查询#1的结果:
{
"primary_country_id": {
"_type": "terms",
"missing": 3475,
"total": 312111,
"other": 4460,
"terms": [
{
"term": 41,
"count": 187293
},
{
"term": 9,
"count": 24177
},
{
"term": 50,
"count": 17200
},
{
"term": 15,
"count": 13015
},
{
"term": 30,
"count": 10296
},
{
"term": 32,
"count": 8824
},
{
"term": 6,
"count": 7703
},
{
"term": 23,
"count": 7502
},
{
"term": 2,
"count": 5614
},
{
"term": 33,
"count": 5214
},
{
"term": 16,
"count": 4691
},
{
"term": 24,
"count": 3560
},
{
"term": 31,
"count": 3126
},
{
"term": 7,
"count": 2748
},
{
"term": 12,
"count": 1430
},
{
"term": 19,
"count": 1403
},
{
"term": 8,
"count": 1342
},
{
"term": 46,
"count": 1052
},
{
"term": 21,
"count": 948
},
{
"term": 43,
"count": 513
}
]
}
}
查询#2的结果:
{
"primary_country_id": {
"_type": "terms",
"missing": 3475,
"total": 312111,
"other": 0,
"terms": [
{
"term": 41,
"count": 187293
},
{
"term": 9,
"count": 24177
},
{
"term": 50,
"count": 17200
},
{
"term": 15,
"count": 13015
},
{
"term": 30,
"count": 10296
},
{
"term": 32,
"count": 8824
},
{
"term": 6,
"count": 7703
},
{
"term": 23,
"count": 7502
},
{
"term": 2,
"count": 5614
},
{
"term": 33,
"count": 5214
},
{
"term": 16,
"count": 4691
},
{
"term": 24,
"count": 3560
},
{
"term": 31,
"count": 3126
},
{
"term": 7,
"count": 2748
},
{
"term": 12,
"count": 1430
},
{
"term": 19,
"count": 1403
},
{
"term": 8,
"count": 1342
},
{
"term": 46,
"count": 1052
},
{
"term": 21,
"count": 1035
},
{
"term": 43,
"count": 910
},
{
"term": 22,
"count": 906
},
{
"term": 13,
"count": 717
},
{
"term": 28,
"count": 690
},
{
"term": 38,
"count": 415
},
{
"term": 26,
"count": 352
},
{
"term": 37,
"count": 295
},
{
"term": 25,
"count": 208
},
{
"term": 34,
"count": 207
},
{
"term": 4,
"count": 94
},
{
"term": 48,
"count": 92
}
]
}
}
答案 0 :(得分:1)
答案 1 :(得分:1)
这可以在任何分布式系统中发生,正如在另一个答案中提到的那样,它有github issue。唯一100%保证的解决方案是使用单个分片,但不会扩展。
问题表现在高基数字段,具有大量唯一字词的字段。您可以使用shard_size
参数来控制每个分片请求的构面条目数,这可能与size
(默认值10)不同,后者表示您返回的条目数。例如将size
设置为10
而shard_size
设置为100
应该会让事情变得更好,但不能保证您完全准确地计算所有计数,它只是减少你看错计数的几率。你是否仍然得到错误的数量取决于你所面临的领域的基数。您可以想象,如果某个字段包含100个唯一字词,则设置为shard_size
的{{1}}将保证始终具有完美的字数。