Spark Cassandra Connector - where子句

时间:2016-05-03 09:12:17

标签: scala spark-cassandra-connector

我正在尝试使用prepare coordinates while trialcount < expectedNumTrialsNeeded draw random sample (generated array of randomly sampled coordinates and index into it) check if sample is not degenerate and has allowed angle compute number of inliers: this calls my selectbox function, to select a smaller set of points, which are near enough to check - takes 30k sec of 55k take those points and compute their distance to the line, all points within threshold are inliers - takes 12k secs of 55k if number of inliers > best number of inliers yet, this is new best set update expected number of trials needed to find sample of only inliers increment trialcount end return best set 进行select where,但我收到了以下错误:

Datasax Cassandra Connector

我真的不明白为什么java.io.IOException: Exception during preparation of SELECT "path" FROM "tracking"."user_page_action" WHERE token("user_id") > ? AND token("user_id") <= ? AND user_id = ? ALLOW FILTERING: user_id cannot be restricted by more than one relation if it includes an Equal 会增加其他限制。

这就是我试图阅读的内容:

connector

就像他们的documentation

一样

spark.cassandraTable(keySpace,table).select(column).where(whereColumn + " = ?", whereColumnValue).collect() 是表格的user_id,我还使用primary key在终端中尝试了select where,但它确实有效。

我看了类似的问题,但没有帮助

Dataframe where clause doesn't work when use spark cassandra connector

Spark Cassandra connector - where clause

1 个答案:

答案 0 :(得分:0)

正如您所注意到的,spark-cassandra-connector在令牌上添加了范围限制。通常,您的查询会根据令牌范围由连接器拆分为多个查询,以强制执行针对副本的每个查询,从而确保数据位置。 在您的情况下,您使用user_id = value提供完整的分区键(可以说,在这种情况下,Spark不是正确的工具,但我不知道您的应用程序在做什么)。有一些关于Spark-Cassandra-Connector项目的讨论要解决这个问题,我不知道它是否发生过。

但是,如果你切换到Cassandra 2.2或3(我假设你正在运行Cassandra 2.1),Cassandra将接受生成的查询(分区键受到相等和范围限制的查询)。我在2.2.6和3.0.5上测试过它。