Question

假设我有一个看起来像这样的MongoDB查询：

result = db.collection.find(
    {
        'fruit_type': 'apple', 
        'fruit_name': 'macintosh' 
        'primary_color': 'red', 
        'sheen': 'glossy', 
        'origin_label': 'true', 
        'stem_present': 'true', 
        'stem_leaves_present': 'true', 
        'blemish': 'none', 
        'firmness': 'moderate'
    }
)

当我有一些完全符合这套标准的苹果时，这很好。但是，当我没有任何符合这些标准时，我仍然想要苹果。

我们说这里唯一的强制性标准是'fruit_type': 'apple'和'primary_color': 'red'。

对于其他人，我希望尽可能多地匹配尽可能多的标准，而不是提前知道最接近的匹配是什么。这不是“模糊匹配”。这更像是按字段“不精确匹配”。

例如，'sheen': 'matte'但所有其他字段具有相同值的结果将是有效但不精确的结果，给定没有完全匹配。或者'stem_present': 'false'的结果会很好，如果不准确的话。

换句话说：如果我指定8个字段和值，并且没有完全匹配，但是匹配其中7个，匹配6和匹配5，我想要7（和不是其他任何一个。）

换句话说：如果我在亚马逊水果搜索框中键入“红色macintosh苹果无茎叶无光泽”，它仍会显示红苹果，即使它只有光泽的苹果。这就是我想在用户级别重现的效果（假设自然语言查询可以完全呈现为Mongo的查询语言）。

一个解决方案可能是编写一个巨大的$or查询来指定所有排列，但是假设我有30个字段和很多值。我不想提前指定所有内容，因为我不知道将提前进入的查询字段的组合。

是否有一种优雅（高效）的方式可以在MongoDB中从精确到不精确的结果退避或回退？或者是超出查询的解决方案？

Answer 1

这是我所做的（在Python中）作为止损的简化版本。

首先，定义一个完全匹配（这也可以随时出现）：

full_query = {
    'fruit_type': 'apple', 
    'fruit_name': 'macintosh' 
    'primary_color': 'red', 
    'sheen': 'glossy', 
    'origin_label': 'true', 
    'stem_present': 'true', 
    'stem_leaves_present': 'true', 
    'blemish': 'none', 
    'firmness': 'moderate'
}

然后定义基本字段 - 无论如何都必须存在的字段。（这必须以某种方式预先定义。）

essential_query = {
    'fruit_type': 'apple', 
    'primary_color': 'red' 
}

然后从基本查询中获取所有匹配项，并进行比较：

def best_matches(full_query, essential_query):
    items = db.collection.find(essential_query)
    best_matches = defaultdict(list)
    for item in items:
        counter = 0
        for key in full_query:
            if full_query[key] == item.get(key):
                counter += 1
        best_matches[counter].append(item)
    return best_matches

然后，您可以对反向获取的密钥进行排序：argmax密钥包含精确匹配或最接近匹配的产品。（您可以添加功能来告诉您它是什么。）当您下降键时，匹配会变得更糟。也可以想象加权一些字段，放松等式检查等等。

修改

这是另一种非理想的可能性，只有在您预先计算好比赛并且不关心比赛表现时才有效。定义详尽的有序查询字段列表。

fields_essential_to_inessential = [ 'fruit_type', 'primary_color, 'sheen': 'glossy', 'origin_label', 'stem_present', 'stem_leaves_present', 'blemish', 'firmness', 'fruit_name' ]

当查询进入时，请尝试它。如果您没有获得threshold个匹配项，请弹出其中一个字段，然后重试。

def compute_matches(full_query, fields_essential_to_inessential): exists = set() matches = [] threshold = 20 while fields_essential_to_inessential: query = {} for key in fields_essential_to_inessential: if full_query.get(key): query[key] = full_query[key] for item in db.products.find(query): if item['item-id'] not in exists: exists.add(item['item-id']) matches.append(item) if len(matches) == threshold: return matches fields_essential_to_inessential.pop() return matches

Answer 2

您最接近确保至少满足强制性标准的是将所有可选查询字段与$or运算符中的一个必填字段放在一起，因为它选择的文档至少满足$或运算符表达式中的一个可选表达式：

result = db.collection.find(
    {
        'fruit_type': 'apple',                       
        "$or": [ 
            { 'primary_color': 'red' },
            { 'fruit_name': 'macintosh' }, 
            { 'sheen': 'glossy' }, 
            { 'origin_label': 'true' }, 
            { 'stem_present': 'true' }, 
            { 'stem_leaves_present': 'true' }, 
            { 'blemish': 'none' }, 
            { 'firmness': 'moderate' }
        ]
    }
)

上述查询将选择集合中所有文档，其中fruit_type字段值为apple，primary_color字段值等于红色。如果您的收藏中没有primary_color字段值为红色的文档，则上述内容不会返回任何文档。

在性能方面，如果它们是常用的查询，请考虑在两个必填字段上创建复合索引，因为扫描索引要比扫描集合快得多。

有关详情，请参阅 Optimize Query Performance 和 Behaviors - $or Clauses and Indexes

上的文档部分

匹配少于MongoDB的所有字段

2 个答案: