Question

我在mongodb中创建了包含1000万条记录的索引但是出现了错误

db.logcollection.ensureIndex({"Module":1})
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 3,
        "ok" : 0,
        "errmsg" : "Btree::insert: key too large to index, failing play.logcollection.$Module_1 1100 { : \"RezGainUISystem.Net.WebException: The request was aborted: The request was canceled.\r\n   at System.Net.ConnectStream.InternalWrite(Boolean async, Byte...\" }",
        "code" : 17282
}

请帮我讲解如何在mongodb中创建索引，

Answer 1

如果现有文档的索引条目超过index key limit（1024字节），MongoDB将不会在集合上创建索引。不过，您可以改为创建hashed index或text index：

db.logcollection.createIndex({"Module":"hashed"})

或

db.logcollection.createIndex({"Module":"text"})

Answer 2

您可以使用以下命令启动mongod实例来沉默此行为：

mongod --setParameter failIndexKeyTooLong=false

或从mongoShell执行以下命令

db.getSiblingDB('admin').runCommand( { setParameter: 1, failIndexKeyTooLong: false } )

如果你确保你的领域很少超过限制，那么解决此问题的一种方法是将字段（导致索引超出限制）拆分为逐字节长度＆lt;例如1KB对于字段val，我会将其拆分为字段val_1，val_2的元组，依此类推。 Mongo将文本存储为utf-8有效值。这意味着您需要一个可以正确拆分utf-8字符串的函数。

   def split_utf8(s, n):
    """
    (ord(s[k]) & 0xc0) == 0x80 - checks whether it is continuation byte (actual part of the string) or jsut header indicates how many bytes there are in multi-byte sequence

    An interesting aside by the way. You can classify bytes in a UTF-8 stream as follows:

    With the high bit set to 0, it's a single byte value.
    With the two high bits set to 10, it's a continuation byte.
    Otherwise, it's the first byte of a multi-byte sequence and the number of leading 1 bits indicates how many bytes there are in total for this sequence (110... means two bytes, 1110... means three bytes, etc).
    """
    s = s.encode('utf-8')
    while len(s) > n:
        k = n
        while (ord(s[k]) & 0xc0) == 0x80:
            k -= 1
        yield s[:k]
        s = s[k:]
    yield s

然后您可以定义复合索引：

db.coll.ensureIndex({val_1: 1, val_2: 1, ...}, {background: true})

每个val_i

或多个索引：

db.coll.ensureIndex({val_1: 1}, {background: true})
db.coll.ensureIndex({val_1: 2}, {background: true})
...
db.coll.ensureIndex({val_1: i}, {background: true})

重要提示：如果考虑在复合索引中使用您的字段，请注意split_utf8函数的第二个参数。在每个文档中，您需要删除构成索引键的每个字段值的字节总和，例如索引（a：1，b：1，val：1）1024 - sizeof(value(a)) - sizeof(value(b))

在任何其他情况下，使用hash或text索引。

Answer 3

正如不同的人在答案中指出的那样，错误key too large to index意味着您正在尝试在超过1024字节长度的字段或字段上创建索引。

在ASCII术语中，1024字节通常转换为大约1024个字符。

没有解决方案，因为这是MongoDB设置的内在限制，如MongoDB Limits and Thresholds page中所述：

索引条目的总大小（可能包括取决于BSON类型的结构开销）必须小于1024字节。

启用failIndexKeyTooLong错误不是解决方案，如server parameters manual page中所述：

...这些操作会成功插入或修改文档，但索引或索引不会包含对文档的引用。

该句子的含义是违规文件不会包含在索引中，并且可能会在查询结果中遗漏。

例如：

> db.test.insert({_id: 0, a: "abc"})

> db.test.insert({_id: 1, a: "def"})

> db.test.insert({_id: 2, a: <string more than 1024 characters long>})

> db.adminCommand( { setParameter: 1, failIndexKeyTooLong: false } )

> db.test.find()
{"_id": 0, "a": "abc"}
{"_id": 1, "a": "def"}
{"_id": 2, "a": <string more than 1024 characters long>}
Fetched 3 record(s) in 2ms

> db.test.find({a: {$ne: "abc"}})
{"_id": 1, "a": "def"}
Fetched 1 record(s) in 1ms

通过强制MongoDB忽略failIndexKeyTooLong错误，最后一个查询不包含违规文档（即结果中缺少_id: 2的文档），因此查询导致错误的结果集。

Answer 4

遇到“ index key limit”时，解决方案取决于架构的需求。在极少数情况下，键匹配必须大于1024字节是设计要求。实际上，几乎所有数据库都施加了索引键限制限制，但是通常可以在旧式关系数据库（Oracle / MySQL / PostgreSQL）中进行一些配置，因此您可以轻松应对。

对于快速搜索，“文本”索引旨在优化长文本字段上的搜索和模式匹配，非常适合用例。但是，更常见的是，需要长文本值的唯一性约束。而且“文本”索引的行为与设置了唯一标志的唯一标量值不同 { unique: true }（更像是字段中所有文本字符串的数组）。

从MongoDb的GridFS中汲取灵感，可以通过在文档中添加“ md5”字段并在其上创建唯一的标量索引来轻松实现唯一性检查。有点像自定义的唯一哈希索引。这允许几乎无限制（〜16mb）的文本字段长度，该长度用于搜索索引并且在整个集合中是唯一的。

const md5 = require('md5');
const mongoose = require('mongoose');

let Schema = new mongoose.Schema({
  text: {
    type: String,
    required: true,
    trim: true,
    set: function(v) {
        this.md5 = md5(v);
        return v;
    }
  },
  md5: {
    type: String,
    required: true,
    trim: true
  }
});

Schema.index({ md5: 1 }, { unique: true });
Schema.index({ text: "text" }, { background: true });

Answer 5

在我的情况下，我试图在大型子文档数组上建立索引，当我查看查询时，查询实际上是针对某个子属性的子属性，因此我将索引更改为专注于所述子属性，并且该索引可以工作好吧。

在我的情况下，goals是大型子文档数组，失败的“键太大”索引看起来像{"goals": 1, "emailsDisabled": 1, "priorityEmailsDisabled": 1}，查询看起来像这样：

emailsDisabled: {$ne: true},
priorityEmailsDisabled: {$ne: true},
goals: {
  $elemMatch: {
    "topPriority.ymd": ymd,
  }
}

将索引更改为{"goals.topPriority.ymd": 1, "emailsDisabled": 1, "priorityEmailsDisabled": 1}后，它就可以正常工作。

需要明确的是，我确定在这里所做的所有工作就是允许我创建索引。该索引是否适用于该查询的问题是我尚未回答的单独问题。

无法在mongodb中创建索引，“键太大而不能索引”

5 个答案: