我有一张垂直生长的大桌子。我想小批量读取行,以便我可以处理每个行并保存结果。
表格定义
CREATE TABLE foo (
uid timeuuid,
events blob,
PRIMARY KEY ((uid))
)
// Step 1. Get uuid of the last row in a batch
val max = 10
val rdd = sc.cassandraTable("foo", "bar")
var cassandraRows = rdd.take(max)
var lastUUID = cassandraRows.last.getUUID("uid");
// lastUUID = 131ea620-2e4e-11e4-a2fc-8d5aad979e84
// Step 2. Use last row as a pointer to the start of the next batch
val cc = new CassandraSQLContext(sc)
val cql = s"SELECT events from foo.bar where token(uid) > token($lastUUID) limit $max"
// which is at runtime
// SELECT events from foo.bar WHERE
// token(uid) > token(131ea620-2e4e-11e4-a2fc-8d5aad979e84) limit 10
cc.sql(cql).collect()
最后一行抛出
线程“main”中的异常java.lang.RuntimeException:[1.79]失败: ``)''预计但是标识符ea620找到了
从foo.bar中选择事件,其中token(uid)> 令牌(131ea620-2e4e-11e4-a2fc-8d5aad979e84)限制10 ^ 在scala.sys.package $ .error(package.scala:27) 在org.apache.spark.sql.catalyst.AbstractSparkSQLParser.apply(SparkSQLParser.scala:33) 在org.apache.spark.sql.SQLContext $$ anonfun $ 1.apply(SQLContext.scala:79) at org.apache.spark.sql.SQLContext $$ anonfun $ 1.apply(SQLContext.scala:79)
但如果我在 cqlsh 中运行我的cql,它会返回正确的10条记录。
// Step 1. Get uuid of the last row in a batch
val max = 10
val rdd = sc.cassandraTable("foo", "bar")
var cassandraRows = rdd.take(max)
var lastUUID = cassandraRows.last.getUUID("uid");
// lastUUID = 131ea620-2e4e-11e4-a2fc-8d5aad979e84
// Step 2. Execute query
rdd.where(s"token(uid) > token($lastUUID)").take(max)
抛出
org.apache.spark.SparkException:作业因阶段失败而中止: 阶段1.0中的任务0失败1次,最近失败:丢失任务0.0 在阶段1.0(TID 1,localhost):java.io.IOException:期间的异常 准备SELECT“uid”,“events”FROM“foo”。“bar”WHERE 令牌(“uid”)> ? AND令牌(“uid”)< =?和uid> $ lastUUID ALLOW 过滤:第1:118行在字符'$'
中没有可行的选择
如何在spark和Cassandra中使用where token(...)
查询?
答案 0 :(得分:0)
I would use the DataStax Cassandra Java Driver. Similar to your CassandraSQLContext, you would select chunks like this:
val query = QueryBuilder.select("events")
.where(gt(token("uid"),token(lastUUID))
.limit(10)
val rows = session.execute(query).all()
If you want to asynchronously query, session
also has executeAsync
, which returns a RichListenableFuture
which can be wrapped by a scala Future
by adding a callback.