Question

我正在尝试在cassandra数据库中创建一个类似于：

的嵌套数据模型

Forums = {
    forum001: {
        name: "General News",
        topics: {
            topic000001: {
                subject: "This is what I think",
                date: "2012-08-24 10:12:13",
                posts: {
                    post20120824.101213: { username: "tom", content: "Blah blah", datetime: "2012-08-24 10:12:13" }
                    post20120824.101513: { username: "dick", content: "Blah blah blah", datetime: "2012-08-24 10:15:13" }
                    post20120824.103213: { username: "harry", content: "Blah blah", datetime: "2012-08-24 10:32:13" }
                }
            },
            topic000002: {
                subject: "OMG Look at this",
                date: "2012-08-24 10:42:13",
                posts: {
                    post20120824.104213: { username: "tom", content: "Blah blah", datetime: "2012-08-24 10:42:13" }
                    post20120824.104523: { username: "dick", content: "Blah blah blah", datetime: "2012-08-24 10:45:23" }
                    post20120824.104821: { username: "harry", content: "Blah blah", datetime: "2012-08-24 10:48:21" }
                }
            }
        }
    },
    forum002: {
        name: "Specific News",
        topics: {
            topic000003: {
                subject: "Whinge whine",
                date: "2012-08-24 10:12:13",
                posts: {
                    post20120824.101213: { username: "tom", content: "Blah blah", datetime: "2012-08-24 10:12:13" }
                    post20120824.101513: { username: "dick", content: "Blah blah blah", datetime: "2012-08-24 10:15:13" }
                }
            }
        }
    }
}

数据的基本设计是一堆嵌套的地图。我已经读过，由于查询这个数据结构很困难，这是不合理的。为了以这种方式构建数据，这个问题会有什么更好的解决方案？

Answer 1

如果要使用可以排序的任何内容进行查询（例如示例中的日期），则需要在column_name中进行查询。

首先，我会将论坛ID设为行键，而column_family看起来像这样：

*Row*: "forum001"<br>
=> *column*: "name" - *value*: "General News"<br>
=> *column*: "post::20120824101213::[some_uuid]" - *value*: "[serialized blob of data representing everything in the post]"<br>

从这里你必须要求返回范围post::201203* ~ post::201204*中的列，例如3月份的所有帖子。

需要记住的是，行会在您的cassandra集群中随机存储（如果您保留了建议的Cassandra的默认设置）。同一行的列位于同一节点上并进行排序，因此您可以将这些列用于值范围。

对于列名，我喜欢使用列中序列化的对象的类型作为前缀（这样我可以在同一行中有许多类型）。然后，您可以选择如何在列名称中表示日期：

ISO format date + a random UUID：iso格式为您提供了调试的可读性，并以字符串形式排序，附加的UUID用于保证列名的唯一性（或者您可能在高流量时段意外覆盖）
TimeUUID：将一次性分配您的时间和独特性，但您将无法通过cassandra控制台工具自行判断日期。

您必须为任何类型的查询条件（作者，日期，大小......）使用不同的行名称，因此请使用非规范化

好的阅读（我想我已经粘贴了这一千次）是来自eBay的这篇文章的两篇文章：
Cassandra Data Modeling Best Practices, Part 1
Cassandra Data Modeling Best Practices, Part 2

卡桑德拉筑巢的关键价值。改善方案？

1 个答案: