Question

情景：

10.000.000记录/天

记录：访问者，访问日，集群（我们在哪里看到它），元数据

我们希望了解这些信息：

针对给定日期范围的一个或多个群集上的唯一身份访问者。
白天独立访客
为给定范围（平台，浏览器等）分组元数据

我坚持使用的模型是为了轻松查询这些信息：

{
   VisitorId:1, 
ClusterVisit: [
                {clusterId:1, dates:[date1, date2]},
                {clusterId:2, dates:[date1, date3]}
              ]
}

索引：

by VisitorId（确保唯一性）
通过ClusterVisit.ClusterId-ClusterVisit.dates（用于搜索）
by IdUser-ClusterVisit.IdCluster（用于更新）

我还必须将群组分成不同的集合，以便更有效地访问数据。

导入：首先，我们搜索VisitorId - ClusterId的组合，然后我们添加日期。

第二：如果第一个不匹配，我们upsert：

    $addToSet: {VisitorId:1, 
        ClusterVisit: [{clusterId:1, dates:[date1]}]
    }

如果clusterId不存在或者如果VisitorId不存在，则使用第一次和第二次导入。

问题：当集合增长时，更新/插入/ upsert完全低效（几乎不可能），我想因为添加新日期时文档大小变大。很难维护（主要是未设置日期）

我有一个超过50.000.000的集合，我不能再成长了。它只更新100个记录/秒。

我认为我使用的模型不是最适合这种信息的。你认为最好的是获得更多的upsert / sec并快速查询信息，然后我会在我学习并对它充满信心的情况下花费更多时间进行分片。

我在AWS上有一个x1.large实例具有10个磁盘的RAID 10

Answer 1

大型集合上的数组很昂贵：mapreduce，aggregate ...

试试.explain（）： MongoDB 'count()' is very slow. How do we refine/work around with it?

为索引添加显式提示： Simple MongoDB query very slow although index is set

完整堆？： Insert performance of node-mongodb-native

收集的内存空间结束： How to improve performance of update() and save() in MongoDB?

特殊读取群集： http://www.colinhowe.co.uk/2011/02/23/mongodb-performance-for-data-bigger-than-memor/

全局写锁？： mongodb bad performance

慢速日志性能跟踪： Track MongoDB performance?

旋转日志： Does logging output to an output file affect mongoDB performance?

使用探查器： http://www.mongodb.org/display/DOCS/Database+Profiler

将一些集合缓存移动到RAM： MongoDB preload documents into RAM for better performance

关于收集分配大小的一些想法： MongoDB data schema performance

使用单独的集合： MongoDB performance with growing data structure

单个查询只能使用一个索引（更好的是复合索引）： Why is this mongodb query so slow?

缺少钥匙？： Slow MongoDB query: can you explain why?

也许碎片： MongoDB's performance on aggregation queries

改进性能stackoverflow链接： https://stackoverflow.com/a/7635093/602018

进一步分割复制品教育的一个好处是： https://education.10gen.com/courses

Mongodb模型的唯一性

1 个答案: