Question

正如标题所述，假设你有一个帖子集合。 Post有一个userId（作者）。另一位用户可以分享帖子。帖子还有标签，一组标签ID，它们被归类为。如何存储以便快速检索？

使用案例：您有连接。您会看到来自您的连接的帖子或您的连接共享的帖子。帖子有一个“速度”，他们在页面上排序。共享帖子可以继承并保持原始速度，或者通过自己的速度生存或死亡。不确定什么是最好的。

我考虑的选项：

Post {id :uniquePostId, userId: authorId, shares: [userIds of those who shared], tagIds: [tagIds for post]}

此方法存在问题：Mongo不允许您索引两个数组。因此，如果要查询tagIds和共享，查询速度很慢。单独索引会导致几乎全表扫描。

另一种选择：

您像这样复制帖子：

Post {id: uniquePostId, userId: user who authored or shared the post, original: {postId: the original postId, or null if this is it, userId: the author of the original post}}

此方法存在问题：假设您要获取20个帖子，因此您在连接中查询userId，如何处理连接中的重复共享？变得有点难看。

我读过的其他方法：

post: {
 shares_and_tags: [{type: share, id: 1}, {type: tag, id:4}, ...]
}

这似乎解决了索引问题，但我不太清楚Mongo是否知道这里的性能影响。不久会做一些测试，但我想看看社区是否有任何建议或经验。谢谢！

Answer 1

好的，鉴于评论中的讨论：

这是推特在保存在mongodb之后来自twitter的流API的推文，我已经从对象中删除了一些非基本数据以简化示例：

{
    "_id" : ObjectId("4f2849353ac01aebf231408a"),
    "place" : null,
    "text" : "tweet text",
    "created_at" : "Tue Jan 31 20:04:05 +0000 2012",
    "retweet_count" : 0,
    "favorited" : false,
    "source" : "<a href=\"http://mobile.twitter.com\" rel=\"nofollow\">Mobile Web</a>",
    "in_reply_to_screen_name" : null,
    "in_reply_to_user_id" : null,
    "retweeted" : false,
    "in_reply_to_status_id" : null,
    "in_reply_to_status_id_str" : null,
    "id_str" : "123456767800304",
    "user" : {
    },
    "truncated" : false,
    "id" : NumberLong("1234567890"),
    "in_reply_to_user_id_str" : null,
    "entities" : {
        "hashtags" : [ ],
        "user_mentions" : [ ],
        "urls" : [ ]
    }
}

如您所见，每条推文都存储为新的推文。如果这是一条重新推文，它会将转推的标志设置为true，并且该帖子的ID是对顶层字段中引用的响应以及响应的用户的响应。

在社交网站上存储MongoDB中的帖子份额的有效方法？

1 个答案: