Question

我需要对我的项目数据模型设计有所了解。我的项目是实时推荐系统。有一些推荐算法集合。它意味着这样的事情：

collection1 { algorithm1，algorithm5，algorithm6 }

collection2 { algorithm5，algorithm6，algorithm7，algorithm8 }

等

我需要在收集中将每个算法（每2分钟）的成功，选择概率，分数等数据存储起来。所以我选择Cassandra就像我的数据存储一样，因为它适用于时间序列。我需要存储我的数据，以便稍后在一些图形和图表中显示。你认为我的数据模型解决方案还可以吗？我这样做了：

CREATE TABLE algorithm_by_collection_and_date (
   algorithm_id text,
   collection_id text,
   date text,
   event_time timestamp,
   score double,
   probability double,
   PRIMARY KEY ((algorithm_id,collection_id,date),event_time)
);

所以它的设计就像行分区一样，它通过向行键添加数据来限制集合中每个算法的列数（按日期）。

您如何看待这个？谢谢，Jan

Answer 1

我会有这种结构 - 它可以让你规范化你的结构并使它更清洁。我已经赶紧过去了，请为列添加正确的数据类型以及参照完整性约束。

CREATE TABLE algorithm
(
    algorithmId uuid PRIMARY KEY,
    algorithmName text
)

CREATE TABLE collection
(
    collectionID uuid PRIMARY KEY,
    collectionName text
)

CREATE TABLE algo_collection
(
    algoCollectionID uuid PRIMARY KEY
    collectionID
    algorithmID
)

CREATE TABLE recommendation
(
    algoCollectionID 
    date    
    event_time,
    score,
    probability

)

我的项目的NoSQL（cassandra）数据模型

1 个答案: