Question

这是我的文档模态

"translation" : {
        "en" : {
            "name" : "brown fox",
            "description" : "the quick brown fox jumps over a lazy dog"
        },
        "it" : {
            "name" : "brown fox ",
            "description" : " the quick brown fox jumps over a lazy dog"
        },
        "fr" : {
            "name" : "renard brun ",
            "description" : " le renard brun rapide saute par-dessus un chien paresseux"
        },
        "de" : {
            "name" : "brown fox ",
            "description" : " the quick brown fox jumps over a lazy dog"
        },
        "es" : {
            "name" : "brown fox ",
            "description" : " el rápido zorro marrón salta sobre un perro perezoso"
        }
    },

现在我必须为上面的文档添加文本索引。我怎么能实现？我已经在翻译上添加了文本索引，但由于名称和描述都在语言前缀（在对象内），因此无效。我还必须分别给出名称和描述的文字权重（文本分数）。即姓名的文本得分为5，描述的得分为2。所以我不能给出通配符文本索引，即

{'$**': 'text'}

我也试过'translation.en.name': 'text'，但是没有用，而且我的语言也是动态的，因此增加了这个案例的最佳解决方案

非常感谢任何帮助。

Answer 1

由于嵌入字段是动态的，因此最好的方法是修改模式，使translation字段成为嵌入文档数组。映射当前结构的此类模式的示例如下：

"translation": [    
    {
        "lang": "en",
        "name" : "brown fox",
        "description" : "the quick brown fox jumps over a lazy dog"
    },
    {
        "lang": "it",
        "name" : "brown fox ",
        "description" : " the quick brown fox jumps over a lazy dog"
    },
    {
        "lang": "fr",
        "name" : "renard brun ",
        "description" : " le renard brun rapide saute par-dessus un chien paresseux"
    },
    {
        "lang": "de",
        "name" : "brown fox ",
        "description" : " the quick brown fox jumps over a lazy dog"
    },
    {
        "lang": "es",
        "name" : "brown fox ",
        "description" : " el rápido zorro marrón salta sobre un perro perezoso"
    }
]

使用此架构，可以轻松地在name和description字段上应用文字索引：

db.collection.createIndex(
    {
        "translation.name": "text",
        "translation.description": "text"
    }
)

至于修改架构，您需要使用api，它允许您批量更新集合， Bulk API 为您执行此操作。这些提供了更好的性能，因为您将以1000个批量发送操作到服务器，这样可以提供更好的性能，因为您不是将每个请求发送到服务器，而是每1000个请求中只发送一次。

以下演示了此方法，第一个示例使用MongoDB版本中提供的批量API＆gt; = 2.6和＆lt; 3.2。它通过将所有翻译字段更改为数组来更新集合中的所有文档：

var bulk = db.collection.initializeUnorderedBulkOp(),
    counter = 0;

db.collection.find({ 
    "translation": { 
        "$exists": true, 
        "$not": { "$type": 4 } 
    } 
}).snapshot().forEach(function (doc) {
    var localization = Object.keys(doc.translation)
        .map(function (key){
            var obj = doc["translation"][key];
            obj["lang"] = key;
            return obj;
        });
    bulk.find({ "_id": doc._id }).updateOne({ 
        "$set": { "translation": localization }
    });

    counter++;
    if (counter % 1000 === 0) {
        bulk.execute(); // Execute per 1000 operations 
        // re-initialize every 1000 update statements
        bulk = db.collection.initializeUnorderedBulkOp(); 
    }
})
// Clean up remaining operations in queue
if (counter % 1000 !== 0) { bulk.execute(); }

下一个示例适用于新的MongoDB版本3.2，该版本已经deprecated批量API，并使用 bulkWrite() 提供了一套更新的apis。

它使用与上面相同的游标，但使用相同的forEach()游标方法创建具有批量操作的数组，以将每个批量写入文档推送到数组。因为写入命令可以接受不超过1000次操作，所以您需要将操作分组为最多1000次操作，并在循环达到1000次迭代时重新初始化数组：

var cursor = db.collection.find({ 
        "translation": { 
            "$exists": true, 
            "$not": { "$type": 4 } 
        } 
    }).snapshot(),
    bulkUpdateOps = [];

cursor.forEach(function(doc){ 
    var localization = Object.keys(doc.translation)
        .map(function (key){
            var obj = doc["translation"][key];
            obj["lang"] = key;
            return obj;
        });
    bulkUpdateOps.push({ 
        "updateOne": {
            "filter": { "_id": doc._id },
            "update": { "$set": { "translation": localization } }
         }
    });

    if (bulkUpdateOps.length === 1000) {
        db.collection.bulkWrite(bulkUpdateOps);
        bulkUpdateOps = [];
    }
});         

if (bulkUpdateOps.length > 0) { db.collection.bulkWrite(bulkUpdateOps); }

Answer 2

要在名称字段上创建索引，请使用此db.collectionname.createIndex({"name": 'text'})

要确保创建索引，请列出使用此命令创建的所有索引

db.collectionname.getIndexes()

修改

关于索引创建方法的问题，问题是如何用所有语言的上述模型实现

我现在知道了，你不能用现有的文档架构索引你想要的所有语言的方式，请更改架构，下面是你可以实现它的一种方式

 {
 "_id" : 1,
 "translation" : [
         {       "language": "en",
                 "name" : "brown fox",
                 "description" : "the quick brown fox jumps over a lazy dog"
         },
         {       "language" : "it",
                 "name" : "brown fox ",
                 "description" : " the quick brown fox jumps over a lazy dog"
         },
         {       "language" :"fr",
                 "name" : "renard brun ",
                 "description" : " le renard brun rapide saute par-dessus un chien paresseux"
         },
         {       "language" : "de",
                 "name" : "brown fox ",
                 "description" : " the quick brown fox jumps over a lazy dog"
         },
         {       "language":"es",
                 "name" : "brown fox ",
                 "description" : " el rápido zorro marrón salta sobre un perro perezoso"
         }
 ]}

然后将索引创建为db.collectionname.createIndex({"language" : "text"});

基于您建议的模型的上述假设，因为名称和描述是翻译中的关键，而不是顶级对象。不是吗？

不，使用我提供的架构，在名称和描述字段上更容易使用文本索引，您可以根据语言进行搜索。

嵌入式文档中的完整文本搜索

2 个答案: