我正面临一个挑战,我需要通过将类似的项目集中到一个来减少索引数据。实际上,有点重复数据删除。例如。
Product Name | Category | Product Group | Attributes
--------------------------------------------------------------------------------------------
1- Board Marker - Blue | Stationary | White Board Markers | Color = Blue, Type = Board
2- Board Marker - Green | Stationary | White Board Markers | Color = Green, Type = Board
在新的索引数据中,我需要将数据存储为:
Product Name | Category | Product Group |
-----------------------------------------------------------------------------------------------------------------------------------------------
1- Board Marker | Stationary | White Board Markers
TL,DR:在产品名称,类别,产品组,(某些)属性等属性上查找至少90%相似的项目,并生成新索引以存储它们。我是ElasticSearch的新手,也是新手。
PS。我在ElasticSearch论坛上问过这个问题,但没有得到爱。 https://discuss.elastic.co/t/clustering-deduplicating-data-in-elasticsearch/96274