Question

我有几个客户，每个客户都由一个＆＃34;租户＆＃34;

代表

我想知道将这个概念建模的最佳方法是什么。我做了很多研究并找到了这个主题：http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Modeling-multi-tenanted-Cassandra-schema-td7591311.html

我知道有几种可能性

租户的一个密钥空间
租户的一个表（列族）
一个字段代表所有表格中的租户

我选择解决方案3，但我不确定是否拥有最佳性能的最佳架构

这是我的个人资料架构

CREATE TABLE profiles (
  id timeuuid,
  tenant text,
  email text,
  datasources set<text>,
  info map<text, text>,
  friends set<timeuuid>,
  PRIMARY KEY(id, tenant)
);

CREATE INDEX ON profiles(datasources);
CREATE INDEX ON profiles(email);

我的PARTITION KEY是＆＃34; id＆＃34;对于独特性和CLUSTERING KEY＆＃34; tenant＆＃34;。我的需要是能够尽快执行这些查询

SELECT * FROM profiles WHERE id = x
SELECT * FROM profiles WHERE tenant = x
SELECT * FROM profiles WHERE email = x
SELECT * FROM profiles WHERE datasources CONTAINS x

查询还可以，但我想知道是否更好地拥有＆＃34;租户＆＃34;作为PARTITION KEY而不是＆＃34; id＆＃34;，并使用＆＃34; id＆＃34;作为CLUSTERING KEY

CREATE TABLE profiles (
  ...
  PRIMARY KEY(tenant, id)
);

在我的申请表＆＃34;租户＆＃34;总是一个必填字段，所以以这种方式进行相同的查询不会有问题（但它更快还是更慢？）

SELECT * FROM profiles WHERE tenant = y
SELECT * FROM profiles WHERE tenant = y AND id = x
SELECT * FROM profiles WHERE tenant = y AND email = x
SELECT * FROM profiles WHERE tenant = y AND datasources CONTAINS x

奖励优势：按创建日期排序个人资料的能力（ORDER BY id）

如果我理解的话，使用租户作为PARTITION KEY，Cassandra会将同一租户中的所有元素物理存储在同一行中，并且可能在此行中存储多达20亿个数据，在这种情况下会发生什么我的一个客户超过这个数字？我还读过我们可以使用复合键，例如将当前日期（20150313）放在键的第二部分中，只在一行中将所有新的配置文件分组给租户

CREATE TABLE profiles (
  ...
  date text,
  PRIMARY KEY((tenant, date), id)
);

但是使用此解决方案无法查询所有数据（查询中没有日期）。

另外，正如您在我的架构中所看到的，我使用二级索引来实现＆＃34;电子邮件＆＃34;和＆＃34;数据源＆＃34;领域。但我在这里阅读http://www.datastax.com/documentation/cql/3.1/cql/ddl/ddl_when_use_index_c.html，在一个返回少量结果（在我的例子中是一个）的巨大表上使用二级索引是一种不好的做法。在我的架构＆＃34;数据源＆＃34;是一个包含例如facebookId，twitterId等的集合

如果您有任何想法，我真的很感兴趣:)！如果有我不理解的事情请告诉我

感谢，

多诺万

Answer 1

使用Cassandra进行数据复制不是问题，因此您必须考虑从查询开始的数据建模过程。

所以，我正在考虑这样的事情：

CREATE TABLE profiles (
   id timeuuid,
   tenant text,
   email text,
   datasources set<text>,
   info map<text, text>,
   friends set<timeuuid>,
   PRIMARY KEY((id, tenant))
);

假设租户在应用程序级别是已知的，此模式将为您提供以下快速运行的查询：

SELECT * FROM profiles WHERE id = x and tenant = y


CREATE TABLE profiles_emails (
   id timeuuid,
   tenant text,
   email text,
   datasources set<text>,
   info map<text, text>,
   friends set<timeuuid>,
   PRIMARY KEY((email, tenant))

）;

SELECT * FROM profiles WHERE email = x and tenant = y


CREATE TABLE profiles_tenants (
   id timeuuid,
   tenant text,
   email text,
   datasources set<text>,
   info map<text, text>,
   friends set<timeuuid>,
   PRIMARY KEY((tenant, id))
);

SELECT * FROM profiles WHERE tenant = x and id = y

CREATE TABLE tenants (
   id timeuuid,
   tenant text,
   email text,
   datasources set<text>,
   info map<text, text>,
   friends set<timeuuid>,
   PRIMARY KEY((tenant, date))
 );

 SELECT * FROM profiles WHERE tenant = x and date < y

或者您可以查看http://www.datastax.com/documentation/cql/3.0/cql/cql_using/paging_c.html

对于基于“数据源”的搜索，您可以使用不同的系统，如elasticsearch或solr。或者，如果集合的值有限，那么您可以为每个集合维护一个单独的表。

Cassandra写入操作很快，数据复制不是问题，因此您可以批量写入所有这些表。

您还必须考虑一致性级别，它会对READ性能产生影响。真的取决于你的用例。

在Cassandra中建模MultiTenant

1 个答案: