Question

I have a table which stores users , I want to query all users sorted by their score,what is the most efficient way to achieve this?

Note:I am considering on performance too.

If Cassandra can't do this, Can I use something like Apache Solr to do this with the integration of cassandra?

Answer 1

在分区内，Cassandra按排序顺序存储数据，因此您可以创建如下表：

CREATE TABLE sorted_users (user_type INT, user_id UUID, score INT,
    PRIMARY KEY (user_type, score, user_id)) WITH CLUSTERING ORDER BY (score DESC);

将用户插入表时，将user_type设置为1，以便将所有用户放入同一分区。然后，得分列是一个聚类列，因此行将按降序排序。然后，您可以按排序顺序有效地读出用户，或者根据分数列进行范围查询。一个分区最多可容纳20亿行。

您可能有另一个表，其中包含user_id为主键的所有用户详细信息，当您要根据分数进行查询时，只需使用此表。

要获得前10位用户，您可以：

SELECT user_id, score FROM sorted_users LIMIT 10;

要更新用户的分数，您需要删除旧分数并插入新分数，因为您无法直接更新主键字段。

Answer 2

最有可能：

您将拥有PRIMARY KEY (user_id)（user_id可能特定于您的域/应用程序）
其中user_id将是分区键（存储分区的节点将由Cassandra散列函数（Murmur3）在分区键值上计算）

3.1。您可以将score作为群集列（分区内的数据将对其进行排序的列），但由于您无法为多个用户提供相同的ID，因此它不会产生太大的影响感

3.2。所以你不能要求按分数排序的所有用户，因为用户分布在Cassandra的节点中

3.3。如果你运行select * from users order by score;，你会回来的 Bad Request: ORDER BY is only supported when the partition key is restricted by an EQ or an IN.（证明3.2。）

3.4。当然，您仍然可以select * from users，但是您需要在应用程序中手动排序

问候，Solr，我无法肯定地说，但据我所知，Spark通常用于这个purporse（因为它提供了更多的查询功能，通过将数据尽可能地保存在内存中），你可以看一下来自datastax的官方https://github.com/datastax/spark-cassandra-connector。

How to store data in cassandra to query all records sorted by one column?

2 个答案: