Question

我一直在尝试Cassandra，需要一些帮助来理解一些问题。我是cassandra的新手，我不确定将MySQL数据库翻译成Cassandra会导致我陷入陷阱，因为他说缺乏经验或对cassandra知之甚少。所以我希望我能从经验丰富的cassandra用户/开发者那里获得有用的信息。

下面是我创建的示例键空间。如果有经验的人可以指出，我想知道设计中的任何缺点。

create keyspace Students with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = {replication_factor:1};
use Students;
create column family StudentID with column_type = 'Super' and comparator = 'UTF8Type' and subcomparator = 'UTF8Type' and default_validation_class = 'UTF8Type' and column_metadata = 
[{column_name : 'First Name', validation_class : UTF8Type}, 
{column_name : 'Last Name', validation_class : UTF8Type}, 
{column_name : 'Subjects', validation_class : UTF8Type}, 
{column_name : 'Class', validation_class : UTF8Type}];


 set StudentID[utf8('1968')]['00001']['First Name'] = 'Mark';
 set StudentID[utf8('1968')]['00001']['Last Name'] = 'Myers';
 set StudentID[utf8('1968')]['00001']['Subjects'] = 'Maths, Chemistry';
 set StudentID[utf8('1968')]['00001']['Class'] = '10th grade';


create keyspace Teachers with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = {replication_factor:1};
use Teachers;
create column family TeacherID with column_type = 'Super' and comparator = 'UTF8Type' and subcomparator = 'UTF8Type' and default_validation_class = 'UTF8Type' and column_metadata = 
[{column_name : 'First Name', validation_class : UTF8Type}, 
{column_name : 'Last Name', validation_class : UTF8Type}, 
{column_name : 'Subjects', validation_class : UTF8Type}, 
{column_name : 'Class', validation_class : UTF8Type}];

set TeacherID[utf8('777')]['234-333']['First Name'] = 'Mark';
set TeacherID[utf8('777')]['234-333']['Last Name'] = 'Myers';
set TeacherID[utf8('777')]['234-333']['Subjects'] = 'Maths, Chemistry,physics';
set TeacherID[utf8('777')]['234-333']['Class'] = '10th grade, 11th grade, 9th grade';



create keyspace Subjects with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = {replication_factor:1};
use Subjects;
create column family SubjectNames with default_validation_class = 'UTF8Type' and comparator = 'UTF8Type' and column_metadata = 
[{column_name : 'Names1', validation_class : UTF8Type}];


set SubjectNames[utf8('Current')]['Name1']= 'maths';
set SubjectNames[utf8('Current')]['Name2']= 'physics';
set SubjectNames[utf8('Current')]['Name3']= 'chemistry';
set SubjectNames[utf8('Current')]['Name4']= 'CS';

三个关键空间 - 学生，教师和科目。我肯定需要这些键空间之间的某些关系，并且还需要查询数据。 e.g。

我会查询具有某个科目和/或班级的学生
有一定班级的老师
列出某个学生从某个班级学习的所有科目。

据我所知，我肯定需要创建二级索引才能使查询正常工作。也就是说，检索某些条款的数据。

我知道如果我是正确的

我们在cassandra中没有'like'条款
对于列的每个值（最后一个键值对），必须分解该值。这是个别的话。说，我想得到一个主题列表，这样每个主题必须位于与之关联的不同列中。我无法查询类似“subjectA，subjectB”的列值，而是将其分解为SubjectA和SubjectB并将它们放在不同的列中。

以下是密钥空间。

students subject teachers

Answer 1

首先，Cassandra是您正在尝试的工作的正确工具吗？ Cassandra在处理分布式，松散耦合的数据集方面表现非常出色，这些数据集需要高速读取和写入功能，但是当你想在其上强制执行关系模型时，它开始变得笨拙，因此我的问题。 如果您有一个高度关系数据集，就像您在此处展示的示例一样，重点在于确定信息之间的关系，那么 MySQL将是一个比Cassandra 更好的工具。

我认为你将密钥空间混淆为与MySQL表的1-1映射。 密钥空间更直接对应于数据库而不是数据库中的表。首先，您可能需要重新设计密钥空间设置以将所有内容放在一起，如下所示：

keyspace: School
Column Family: Student ; Row Key: StudentID ; Col1 = First Name, Col2 = Last Name, Col3 = subjects, Col4 = class.

重复其他两个列系列 - 不确定是否需要超级列。

要进行横切检索，您需要创建一个列系列，例如：

Column Family: Class ; RowKey: ClassId (ie 10th Grade) ; col1= (TeacherId:TeacherId), Col2 = (StudentId:StudentId)

在特定类和属于它的所有人之间构建关系列族。

分手
是的，你需要按主题分解它们并将它们放入它们自己的列族中。请注意，您可以使用secondary indices（从Cassandra .7开始），它允许您执行更多相等类型的查询，例如：

get users where birth_date = 1973;

关于二级指数的使用，请参阅此document。相关报价是，

Cassandra的内置二级索引最适用于很多情况 rows包含索引值。 a中存在的唯一值越多特别是专栏，平均而言，您的开销会增加查询并维护索引。例如，假设您有一个用户有十亿用户的表，并希望通过州查找用户他们住在一起。许多用户将为州共享相同的列值（例如CA，NY，TX等）。这将是一个很好的候选人二级指数。另一方面，如果你想查找用户他们的电子邮件地址（每个用户通常唯一的值），手动维护动态列族可能更有效作为“索引”的一种形式。即使对于包含唯一数据的列，也是如此在使用二级索引时，性能通常很好方便，只要查询量到索引列族是适度的而不是在恒定负荷下。

如果您还没有看到它，DataStax网站将回答您的许多Cassandra问题，如果您要广泛使用Cassandra，我强烈建议您浏览它。

简而言之，您的两个选项是将项目分离并为要维护的每个关系创建列族，或者根据您分隔数据的方式使用二级索引。我个人更喜欢前一种方法 - 尽管有样板 - 因为我认为它更好地扩展。

需要有关Cassandra Keyspaces样本的建议

1 个答案: