Question

CREATE TABLE users ( 
userID uuid, 
firstname text, 
lastname text, 
state text, 
zip int,
age int,
PRIMARY KEY (userID) 
);

我想构建以下查询：

select * from users where age between 30 and 40

select * from users where state in "AZ" AND "WA"

我知道我还需要两个表来执行此查询，但我不知道应该如何处理？

修改

从Carlo的评论中，我发现这是唯一的可能性

CREATE TABLE users ( 
userID uuid, 
firstname text, 
lastname text, 
state text, 
zip int,
age int,
PRIMARY KEY (age,zip,userID) 
);

现在选择年龄在15到30之间的用户。这是唯一的可能性：

select * from users where age IN (15,16,17,....30)

但是，不建议在此使用IN运算符，并且是反模式。

如何在年龄上创建二级指数？

CREATE index users_age ON users(age)

这会有帮助吗？

由于

Answer 1

范围查询是一个棘手的问题。执行实际范围查询的方法是使用复合主键，在聚类部分上创建范围。由于范围在群集部分上，因此无法执行您编写的查询：至少需要在整个分区键上具有相同的条件。我们来看一个例子：

CREATE TABLE users (
  mainland text,
  state text,
  uid int,
  name text,
  zip int,
  PRIMARY KEY ((mainland), state, uid)
)

uid现在只是为了让测试更容易

insert into users (mainland, state, uid, name, zip) VALUES ( 'northamerica', 'washington', 1, 'john', 98100);
insert into users (mainland, state, uid, name, zip) VALUES ( 'northamerica', 'texas', 2, 'lukas', 75000);
insert into users (mainland, state, uid, name, zip) VALUES ( 'northamerica', 'delaware', 3, 'henry', 19904);
insert into users (mainland, state, uid, name, zip) VALUES ( 'northamerica', 'delaware', 4, 'dawson', 19910);
insert into users (mainland, state, uid, name, zip) VALUES ( 'centraleurope', 'italy', 5, 'fabio', 20150);
insert into users (mainland, state, uid, name, zip) VALUES ( 'southamerica', 'argentina', 6, 'alex', 10840);

现在，查询可以执行您所需的操作：

 select * from users where mainland = 'northamerica' and state > 'ca' and state < 'ny';

输出

 mainland    | state    | uid | name   | zip
-------------+----------+-----+--------+-------
northamerica | delaware |   3 |  henry | 19904
northamerica | delaware |   4 | dawson | 19910

如果将int（age，zipcode）作为聚类键的第一列，则可以执行相同的查询来比较整数。

小心谨慎：大多数人在看这种情况时开始考虑“好吧，我可以放一个假的分区键，它总是一样的，然后我可以执行范围查询”。这是一个巨大的错误，分区键负责跨节点的数据分发。设置修复分区键意味着所有数据将在同一节点（及其副本）中完成。

将世界区域划分为15/20区域（为了拥有15/20分区键）是一件事，但还不够，只是为了创建一个有效的例子。

编辑：由于问题编辑

我没有说这是唯一的可能性;如果您找不到对用户进行分区的有效方法，并且需要执行此类查询，则这是一种可能性，而不是唯一的可能性。 应在群集密钥部分上执行范围查询。作为分区键的AGE的一个弱点是你不能对它进行更新，任何时候你需要更新用户的年龄你必须执行删除和插入（另一种方法是写一个birth_year / birth_date而不是年龄，然后计算客户端）

回答关于添加二级索引的问题：实际上对二级索引的查询不支持IN运算符。从CQL消息来看，他们很快就会开发它

错误请求：非主键列（xxx）的IN谓词尚未出现支持的

但是，即使辅助索引支持IN运算符，您的查询也不会从

更改

select * from users where age IN (15,16,17,....30)

只是为了澄清我的概念：任何没有“干净”和“准备”解决方案的东西都需要用户努力以满足其需求的方式建模数据。举个例子（我不是说这是一个很好的解决方案：我不会用它）

CREATE TABLE users (
  years_range text,
  age int,
  uid int,
  PRIMARY KEY ((years_range), age, uid)
)

放一些数据

insert into users (years_range, age , uid) VALUES ( '11_15', 14, 1);
insert into users (years_range, age , uid) VALUES ( '26_30', 28, 3);
insert into users (years_range, age , uid) VALUES ( '16_20', 16, 2);
insert into users (years_range, age , uid) VALUES ( '26_30', 29, 4);
insert into users (years_range, age , uid) VALUES ( '41_45', 41, 5);
insert into users (years_range, age , uid) VALUES ( '21_25', 23, 5);

查询数据

select * from users where years_range in('11_15', '16_20', '21_25', '26_30') and age > 14 and age < 29;

输出

 years_range | age | uid
-------------+-----+-----
       16_20 |  16 |   2
       21_25 |  23 |   5
       26_30 |  28 |   3

此解决方案可能会解决您的问题，并且可以在小型群集中使用，其中大约20个密钥（0_5 ... 106_110）可能具有良好的分布。但是这个解决方案与之前的解决方案一样，不允许UPDATE并减少密钥的分配。优点是你有小的IN集。

在SI已经允许使用IN子句的完美世界中，我将UUID用作分区键，将years_range（设置为birth_year_range）设置为SI并“过滤”我的数据客户端（如果对10岁>年龄感兴趣＆gt; 22我会要求IN('1991_1995', '1996_2000', '2001_2005', '2006_2010', '2011_2015')在我的申请中计算和删除无用的年份）

HTH，卡罗

Answer 2

我发现使用allow filtering，我可以查询范围：例子就在这里：

 CREATE TABLE users2 (
  mainland text,
  state text,
  uid int,
  name text,
  age int,
  PRIMARY KEY (uid, age, state)
) ;

insert into users2 (mainland, state, uid, name, age) VALUES ( 'northamerica', 'washington', 1, 'john', 81);
insert into users2 (mainland, state, uid, name, age) VALUES ( 'northamerica', 'texas', 1, 'lukas', 75);
insert into users2 (mainland, state, uid, name, age) VALUES ( 'northamerica', 'delaware', 1, 'henry', 19);
insert into users2 (mainland, state, uid, name, age) VALUES ( 'northamerica', 'delaware', 4, 'dawson', 90);
insert into users2 (mainland, state, uid, name, age) VALUES ( 'centraleurope', 'italy', 5, 'fabio', 50);
insert into users2 (mainland, state, uid, name, age) VALUES ( 'southamerica', 'argentina', 6, 'alex', 40);

select * from users2 where age>50 and age<=100 allow filtering;

    uid | age | state      | mainland     | name
-----+-----+------------+--------------+--------
   1 |  75 |      texas | northamerica |  lukas
   1 |  81 | washington | northamerica |   john
   2 |  75 |      texas | northamerica |  lukas
   4 |  90 |   delaware | northamerica | dawson

(4 rows)

我不确定这个性能杀手。但这似乎有效。事实上，在primary key期间，我甚至不必提供uid query execution {{1}}

如何在cassandra中构建范围查询？

2 个答案: