Question

我想查询其架构为

的Cassandra表

  CREATE TABLE IF NOT EXISTS mykeyspace.user (
    id text,
    login text,
    password text,
    firstname text,
    lastname text,
    email text,
   PRIMARY KEY(id)
   );

我想使用login和firstname查询此表，这显然是非主列。我在某处看过Spark在这些场景中非常有用。所以我想知道如何使用Spark查询cassandra与非主列。

我也使用Java来查询数据库。

Answer 1

最简单的解决方案是使用jdbc连接器（例如profress makes one）

Spark的jdbc support已有详细记载

然后，您可以使用spark数据帧来查询和使用Cassandra表，例如

df = spark.read.jdbc('jdbc:cassandra:dbserver', 'mykeyspace.user', connectionProperties).filter('login = "foo" and firstname = "bar"')

（对不起，我的例子是在python中，但java api几乎相同）

Answer 2

Spark用于批量操作，例如扫描整个表或将其与其他表连接。在您的案例中最好使用二级索引或物化视图： https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlCreateIndex.html https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlCreateMaterializedView.html

所以在登录栏上使用索引：

CREATE INDEX ON mykeyspace.user (login);
select * from mykeyspace.user where login = 'a';

使用JAVA中的Spark查询Cassandra中的非主列

2 个答案: