在mapPartition中执行多重选择时。 我按行准备2个请求。
建议代码看起来像这样
source.mapPartitions { partition =>
lazy val prepared: PreparedStatement = ...
cc.withSessionDo { session =>
partition.map{ row =>
session.execute(prepared.bind(row.get("id"))
}
}
}
当批次达到~400行时,它会抛出一个
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /localhost:9042 (com.datastax.driver.core.ConnectionException: [/localhost:9042] Pool is CLOSING))
at com.datastax.driver.core.RequestHandler.reportNoMoreHosts(RequestHandler.java:216)
at com.datastax.driver.core.RequestHandler.access$900(RequestHandler.java:45)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.sendRequest(RequestHandler.java:276)
at com.datastax.driver.core.RequestHandler.startNewExecution(RequestHandler.java:118)
at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:94)
at com.datastax.driver.core.SessionManager.execute(SessionManager.java:552)
at com.datastax.driver.core.SessionManager.executeQuery(SessionManager.java:589)
at com.datastax.driver.core.SessionManager.executeAsync(SessionManager.java:97)
... 25 more
它试图更改配置以查看它是否可以执行某些操作但错误仍在弹出
.set("spark.cassandra.output.batch.size.rows", "auto")
.set("spark.cassandra.output.concurrent.writes", "500")
.set("spark.cassandra.output.batch.size.bytes", "100000")
.set("spark.cassandra.read.timeout_ms", "120000")
.set("spark.cassandra.connection.timeout_ms" , "120000")
这种代码可以在spark cassandra connector中使用,但也许我还没有看到
引发异常后,下一个流批次连接到cassandra没有问题。
我是否将同时请求的cassandra暂停?
我使用带有火花连接器1.4.0-M3和驱动程序2.1.7.1的cassandra 2.1.3