我正在尝试使用spark-cassandra-connector连接spark和Cassandra。连接已建立,但是当我尝试在JavaRDD上执行操作时,就会遇到问题。
java.io.IOException: Failed to open native connection to Cassandra at {10.0.21.92}:9042
这是我要实现的配置和代码:
SparkConf sparkConf = new SparkConf().setAppName("Data Transformation").set("spark.serializer","org.apache.spark.serializer.KryoSerializer").setMaster("local[4]");
sparkConf.set("spark.cassandra.connection.host", server ip);
sparkConf.set("spark.cassandra.connection.port", "9042");
sparkConf.set("spark.cassandra.connection.timeout_ms", "5000");
sparkConf.set("spark.cassandra.read.timeout_ms", "200000");
sparkConf.set("spark.cassandra.auth.username", user_name);
sparkConf.set("spark.cassandra.auth.password", password);
JavaSparkContext sparkContext = new JavaSparkContext(sparkConf);
下面是我在javardd上执行操作的代码:
CassandraJavaRDD<CassandraRow> cassandraRDD = CassandraJavaUtil.javaFunctions(sparkContext).cassandraTable(keySpaceName, tableName);
JavaRDD<GenericTriggerEntity> rdd = cassandraRDD.map(new Function<CassandraRow, GenericTriggerEntity>() {
private static final long serialVersionUID = -165799649937652815L;
@Override
public GenericTriggerEntity call(CassandraRow row) throws Exception {
GenericTriggerEntity genericTriggerEntity = new GenericTriggerEntity();
if(row.getString("end") != null) genericTriggerEntity.setEnd(row.getString("end"));
if(row.getString("key") != null)
genericTriggerEntity.setKey(row.getString("key"));
genericTriggerEntity.setKeyspacename(row.getString("keyspacename"));
genericTriggerEntity.setPartitiondeleted(row.getString("partitiondeleted"));
genericTriggerEntity.setRowdeleted(row.getString("rowDeleted"));
genericTriggerEntity.setRows(row.getString("rows"));
genericTriggerEntity.setStart(row.getString("start"));
genericTriggerEntity.setTablename("tablename");
genericTriggerEntity.setTriggerdate(row.getString("triggerdate"));
genericTriggerEntity.setTriggertime(row.getString("triggertime"));
genericTriggerEntity.setUuid(row.getUUID("uuid"));
return genericTriggerEntity;
}
});
这是我正在执行的JavaRDD操作
JavaRDD<String> jsonDataRDDwords = rdd.flatMap(s -> Arrays.asList(SPACE.split((CharSequence) s)));
JavaPairRDD<String, Integer> jsonDataRDDones = jsonDataRDDwords.mapToPair(s -> new Tuple2<>(s, 1));
JavaPairRDD<String, Integer> jsonDataRDDcounts = jsonDataRDDones.reduceByKey((i1, i2) -> i1 + i2);
List<Tuple2<String, Integer>> jsonDatRDDoutput = jsonDataRDDcounts.collect();
我什至尝试通过telnet到Cassandra服务器打开端口。
我能够建立连接,但随后在执行reduceByKey时遇到上述异常。
我无法确定问题所在。 javardd操作中有问题。 任何帮助,将不胜感激。 提前谢谢。
答案 0 :(得分:1)
您可以使用socat命令将本地端口转发到远程cassandra端口:
apt-get install socat
socat tcp-listen:9042,fork tcp:10.0.21.92:9042 &
答案 1 :(得分:0)
上述错误是由于cassandra驱动器核心存在某些依赖性问题。 通过在我的pom.xml中添加度量标准依赖性来解决该问题
<dependency>
<groupId>io.dropwizard.metrics</groupId>
<artifactId>metrics-core</artifactId>
<version>3.2.2</version>
</dependency>