Question

我正在Solr中存储有点像这样的结构化数据。

[{
    "Product": "Boomerang"
    "Price": 42,
    "Stores": ["Sport Shack", "Joe's Sport Supplies", "Sports and More", "The Outdoor Shop"]
},
{
    "Product": "Juggling Chainsaws"
    "Price": 94,
    "Stores": ["Sport Shack", "Joe's Sport Supplies", "Sports and More","The Outdoor Shop"]
},
{
    "Product": "Chainsaw"
    "Price": 5,
    "Stores": ["Labor Store", "The Outdoor Shop", "Fish n Woodchips"],
}]

成千上万的“商品”字段具有相同值的产品。

有没有一种方法可以消除重复存储这些相同值的需求，而又不影响查询的搜索性能，例如：“从Labor Store找到电锯”

这是我的想法：

[{
    "Product": "Boomerang"
    "Price": 42,
    "StoreGroup": "NoveltySportsStores",
},
{
    "Product": "Juggling Chainsaws"
    "Price": 94,
    "StoreGroup": "NoveltySportsStores",
},
{
    "Product": "Chainsaw"
    "Price": 5,
    "StoreGroup": "OutdoorsStores"
},
{
    "NoveltySportsStores": ["Sport Shack", "Joe's Sport Supplies", "Sports and More", "The Outdoor Shop"]
},
{
    "OutdoorsStores": ["Labor Store", "The Outdoor Shop", "Fish n Woodchips"]
}]

编辑：该示例已完全组成。对于我的实际用例，这些组将保持不变，并且每个组重复约5000次，总共约50000个组。

Answer 1

您正在考虑将Solr / Lucene视为RDBMS，事实并非如此。即使您觉得重复次数过多且资源浪费，事实也并非如此。第一种方法是索引数据的自然而最佳的方法。

您也可以将其用作第二种方法，但是第一种方法更好，也更简单。

如何在Solr中有效存储重复数据而不影响性能

1 个答案: