NoHostAvailableException - Spark-Cassandra-Connector

时间:2018-06-07 14:05:31

标签: cassandra spark-cassandra-connector

我在2.3.0版本上使用spark-cassandra-connector_2.11。 运行最新Spark 2.3.0 试图从Cassandra读取数据(3.0.11.1485)DSE(5.0.5)。

示例读取无问题的工作:

 JavaRDD<Customer> result = javaFunctions(sc).cassandraTable(MyKeyspaceName, "customers", mapRowTo(Customer.class));

另一个正常工作的阅读: 如果我从单元测试 - 单线程 - 单读如下。

cassandraConnector.withSessionDo(new AbstractFunction1<Session, Void>() {
                @Override
                public Void apply(Session session) {
                   //Read something from Cassandra via Session - Works Fine Here as well.
                }
            });

示例读取(mapPartitions + withSessionDo)有问题的代码:

CassandraConnector cassandraConnector = CassandraConnector.apply(sc.getConf());

SomeSparkRDD.mapPartitions((FlatMapFunction<Iterator<Customer>, CustomerEx>) customerIterator ->
            cassandraConnector.withSessionDo(new AbstractFunction1<Session, Iterator<CustomerEx>>() {
                @Override
                public Iterator<CustomerEx> apply(Session session) {
                    return asStream(customerIterator, false)
                            .map(customer -> fetchDataViaSession(customer, session))
                            .filter(x -> x != null)
                            .iterator();
                }
            }));


public static <T> Stream<T> asStream(Iterator<T> sourceIterator, boolean parallel) {
    Iterable<T> iterable = () -> sourceIterator;
    return StreamSupport.stream(iterable.spliterator(), parallel);
}

一些迭代:map(customer - &gt; fetchDataViaSession(customer,session)) 但是,多数人因NoHostAvailableException而失败。

尝试了各种设置但没有成功:

spark.cassandra.connection.connections_per_executor_max
spark.cassandra.connection.keep_alive_ms
spark.cassandra.input.fetch.size_in_rows
spark.cassandra.input.split.size_in_mb

Also Tried to reduce the number of Partitions of the RDD which I do mapPartitions+withSessionDo on.

2 个答案:

答案 0 :(得分:0)

检查您的Cassandra群集是否启用了SSL。如果是这样的话,如果您没有配置正确的证书,我会看到同样的错误。

答案 1 :(得分:0)

看起来这解决了它:

.set("spark.cassandra.connection.keep_alive_ms", "1200000")