我正在将mongodb与mongo
红宝石配合使用。
我正在处理的数百万个文档中确实有很长的字符串,我需要按长字符串进行查找-这很慢-大约需要5秒钟-这使我需要为键编制索引。当我尝试在mongodb中索引相应的键时,出现错误key too large to index
。阅读Mongodb docs时,这是可以预期的,因为索引大小有限制-The total size of an index entry, which can include structural overhead depending on the BSON type, must be less than 1024 bytes.
一种解决方法是将保存日志字符串的键拆分为小于1024字节的较小可咀嚼大小。
client[:longStrColl].find().each do |doc|
strParts = {}
str = doc[:longStr]
strParts = str.scan(/.{1,1024}/) # split into parts of max 1024 and min 1 chars
strParts.each_with_index do |val, index|
strParts["str#{index}"] = val;
end
client[:longStrColl].update_one({"_id" => doc["_id"]},doc.merge(strParts))
end
这将longStr
分成2个键,最大长度为1024,如下所示,这样可以在它们上进行compound index,以加快查找速度。
{
"_id" : ObjectId("5b6c634dd0ae362168c8fd58"),
"longStr" : "abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0",
//other key: values
"str0" : "abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a",
"str1" : "6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0abcdefghdefg1001955aa22920c1120006000abcdefghdefg1001955aa22920c1154384271cfbe669aed97284e39f7abcdefghdefg1001955aa229abcdefghdefg1001955aa22920c1120c11ccd35db2f509c73ef99a6bb65e2138e7df334d76cb73d011bc410a438f5e9a19a6bb65b003203a038037ab4c00009a496f48923660e0106143891a3d3b797367d1b0"
}
即使我的索引创建失败,我也会遇到错误key too large to index
db.longStrColl.createIndex( { str0: 1, str1: 1});
我该如何正确地分割字符串并为它们建立索引?