Question

我正在基于ElasticSearch对电子商务产品目录进行原型设计。每个产品都被索引为一个文档（其中包含名称和描述等属性）。

有一件事我无法解决，我想根据用户的购买历史来提高某些产品的得分。

我能想到的唯一选择是将购买历史存储为产品的子文档。然后使用custom_filters_score和过滤器来查找具有给定userId的子文档。在这种情况下，过滤器确定给定用户是否已购买给定产品，如果是，则会提高分数。

这种方法的问题在于，某些产品每个月可能会被购买数十万次，而且我不确定ElasticSearch在这种情况下的表现如何。

完美的解决方案是，如果我可以将购买历史记录放在单独的索引或同一索引中，而是作为不同的文档类型（比如'userspurchasehistory'）。示例文档：

{userId: 1234, purchesedProducts: [34,112323,1223,32342,31234]}

然后使用表达如下内容的查询得分提升：如果术语34（productId）出现在'userId'等于1234的userspurchasehistory（type）文档的'purchesedProducts'（字段名称）中，则按因子2提升查询。

这里有任何想法或想法吗？

更新：

我为大量产品目录和大量销售数据进行了一些测试：产品（类型）文件数：500 000 SalesHistory（类型）文件数：14 000 000 索引大小：2.5GB Elastic Serach：一个节点，所有默认设置

SalesHistory docuemtns是产品文档的子文档。销售分录：

~20% of products: 40 entries 
~20% of products: 30 entries 
~20% of products: 20 entries 
~20% of products: 10 entries 
~20% of products: 5 entries 

200 products with 10 000 sales entries (plus previously added 5-40 entries)
200 products with  5 000 sales entries (plus previously added 5-40 entries)
200 products with  2 500 sales entries (plus previously added 5-40 entries)
200 products with  1 000 sales entries (plus previously added 5-40 entries)
200 products with    500 sales entries (plus previously added 5-40 entries)
1 product 18 500 entries

示例查询：

curl -XGET "http://localhost:9200/demoproducts/_search" -d'
{
   "query": {
      "custom_filters_score": {
         "query": {
            "match_all": {}
         }
      },
      "filters": [
         {
            "filter": {
               "has_child": {
                  "type": "saleshistory",
                  "query": {
                     "term": {
                        "userId": {
                           "value": "28875"
                        }
                     }
                  }
               }
            },
            "boost": 2
         }
      ]
   }
}'

结果：

{
  "took": 33,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 500001,
    "max_score": 2
    ...
  }
}

当我在查询中添加一些过滤器时（几乎在所有情况下我们的查询都包含一些过滤器），响应时间大约为 7ms

结论

没有必要以任何其他方式将此案例作为子文档来实现。

Answer 1

您可以使用用户的购买历史记录动态构建条款查询，而不是修改文档。

curl -XGET "http://localhost:9200/demoproducts/_search" -d'
    {
       "query": {
           "terms": {"id":["34","112323","1223","32342","31234"]}
        }
    }
}

ElasticSearch根据来自不同类型的查询的结果来提升文档分数

1 个答案: