用于电子邮件收集的MongoDB分片键

时间:2014-09-08 07:02:48

标签: mongodb sharding mongodb-indexes

我正在使用MongoDB 2.6.1

我有一个集合,可以按项目方式存储电子邮件。文件如下(为了便于阅读,未包括“原始电子邮件文本”键):

{
        "_id" : ObjectId("540d4ae7eea013be22f1f0d6"),
        "Project_Id" : "E11593",
        "Project_Name" : "National Hearing Care- Novo",
        "Email_Id" : "E11593.monitor@lntinfotech.com",
        "Date" : "Mon Sep 08 05:05:35 IST 2014",
        "To" : "manisha.bhopate@infostretch.com; ",
        "From" : "Shubhangi Thorat",
        "CC" : "NO VALUES",
        "Subject" : "RE: pics",
        "Unique_Id" : "Mon-Sep-08-11:51:20-IST-2014"
}
{
        "_id" : ObjectId("540d4ae7eea013be22f1f0d7"),
        "Project_Id" : "E11593",
        "Project_Name" : "National Hearing Care- Novo",
        "Email_Id" : "E11593.monitor@lntinfotech.com",
        "Date" : "Mon Sep 08 05:02:38 IST 2014",
        "To" : "manisha.bhopate@infostretch.com; ",
        "From" : "Shubhangi Thorat",
        "CC" : "NO VALUES",
        "Subject" : "FW: pics",
        "Unique_Id" : "Mon-Sep-08-11:51:20-IST-2014"
}
{
        "_id" : ObjectId("540d4ae7eea013be22f1f0d8"),
        "Project_Id" : "E11593",
        "Project_Name" : "National Hearing Care- Novo",
        "Email_Id" : "E11593.monitor@lntinfotech.com",
        "Date" : "Mon Sep 08 04:37:47 IST 2014",
        "To" : "Prachi Sutrawe; ",
        "From" : "Mahindra Shambharkar",
        "CC" : "NO VALUES",
        "Subject" : "Accepted: Show and tell -Sale",
        "Unique_Id" : "Mon-Sep-08-11:51:20-IST-2014"
}

选择分片键时,我有以下想法:

  1. 构建复合索引{ Project_Id,_id },因为 Project_Id 的基数较低但 _id 的基数较高
  2. 日期”/“ Unique_Id ”的散列索引,这两个时间戳都是时间戳
  3. 来自”字段的散列索引,但它的基数取决于数字。参与该项目的人员
  4. ''和“ CC ”是多值键,“主题”具有较高的随机性,因此不确定是否可以使用这些键在所有
  5. 虽然未在输出中列出,但 Raw_Text '将被不同的应用程序广泛阅读,但我不确定是否应该构建索引,甚至用于此密钥的分片!
  6. 在这种情况下,最佳分片键是什么?

0 个答案:

没有答案