在Mongodb中索引具有长字符串数据的键

时间:2018-08-12 04:57:26

标签: ruby mongodb

我正在将mongodb与mongo红宝石配合使用。

我正在处理的数百万个文档中确实有很长的字符串,我需要按长字符串进行查找-这很慢-大约需要5秒钟-这使我需要为键编制索引。当我尝试在mongodb中索引相应的键时,出现错误key too large to index。阅读Mongodb docs时,这是可以预期的,因为索引大小有限制-The total size of an index entry, which can include structural overhead depending on the BSON type, must be less than 1024 bytes.

一种解决方法是将保存日志字符串的键拆分为小于1024字节的较小可咀嚼大小。

client[:longStrColl].find().each do |doc|
    strParts = {}
    str = doc[:longStr]
    strParts = str.scan(/.{1,1024}/) # split into parts of max 1024 and min 1 chars
    strParts.each_with_index do |val, index|
        strParts["str#{index}"] = val;
    end
    client[:longStrColl].update_one({"_id" => doc["_id"]},doc.merge(strParts))
end

这将longStr分成2个键,最大长度为1024,如下所示,这样可以在它们上进行compound index,以加快查找速度。

{
    "_id" : ObjectId("5b6c634dd0ae362168c8fd58"),
    "longStr" : "abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0",
    //other key: values 
    "str0" : "abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a",
    "str1" : "6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0"
}

即使我的索引创建失败,我也会遇到错误key too large to index

db.longStrColl.createIndex( { str0: 1, str1: 1});

我该如何正确地分割字符串并为它们建立索引?

0 个答案:

没有答案