我正在Solr中存储有点像这样的结构化数据。
[{
"Product": "Boomerang"
"Price": 42,
"Stores": ["Sport Shack", "Joe's Sport Supplies", "Sports and More", "The Outdoor Shop"]
},
{
"Product": "Juggling Chainsaws"
"Price": 94,
"Stores": ["Sport Shack", "Joe's Sport Supplies", "Sports and More","The Outdoor Shop"]
},
{
"Product": "Chainsaw"
"Price": 5,
"Stores": ["Labor Store", "The Outdoor Shop", "Fish n Woodchips"],
}]
成千上万的“商品”字段具有相同值的产品。
有没有一种方法可以消除重复存储这些相同值的需求,而又不影响查询的搜索性能,例如:“从Labor Store找到电锯”
这是我的想法:
[{
"Product": "Boomerang"
"Price": 42,
"StoreGroup": "NoveltySportsStores",
},
{
"Product": "Juggling Chainsaws"
"Price": 94,
"StoreGroup": "NoveltySportsStores",
},
{
"Product": "Chainsaw"
"Price": 5,
"StoreGroup": "OutdoorsStores"
},
{
"NoveltySportsStores": ["Sport Shack", "Joe's Sport Supplies", "Sports and More", "The Outdoor Shop"]
},
{
"OutdoorsStores": ["Labor Store", "The Outdoor Shop", "Fish n Woodchips"]
}]
编辑: 该示例已完全组成。对于我的实际用例,这些组将保持不变,并且每个组重复约5000次,总共约50000个组。
答案 0 :(得分:3)
您正在考虑将Solr / Lucene视为RDBMS,事实并非如此。即使您觉得重复次数过多且资源浪费,事实也并非如此。第一种方法是索引数据的自然而最佳的方法。
您也可以将其用作第二种方法,但是第一种方法更好,也更简单。