我试图将火花流持续到Cassandra,这是我的代码:
JavaDStream<BusinessPointNYCT> studentFileDStream = m_JavaStreamingContext.textFileStream(new File(fileDir, "BUSINESSPOINTS_NY_CT.csv").getAbsolutePath()).map(new BusinessPointMapFunction());
//Save it to Cassandra
CassandraStreamingJavaUtil.javaFunctions(studentFileDStream)
.writerBuilder("spatial_keyspace", "businesspoints_ny_ct", mapToRow(BusinessPointNYCT.class)).saveToCassandra();
我的应用程序启动时没有任何错误或警告,但数据不会持久存在于Cassandra中。根据日志,它会在存储后删除它:
16/04/14 14:54:30 INFO JobScheduler: Added jobs for time 1460625870000 ms
16/04/14 14:54:30 INFO JobScheduler: Starting job streaming job 1460625870000 ms.0 from job set of time 1460625870000 ms
16/04/14 14:54:31 INFO SparkContext: Starting job: runJob at DStreamFunctions.scala:54
16/04/14 14:54:31 INFO DAGScheduler: Job 0 finished: runJob at DStreamFunctions.scala:54, took 0.001267 s
16/04/14 14:54:31 INFO JobScheduler: Finished job streaming job 1460625870000 ms.0 from job set of time 1460625870000 ms
16/04/14 14:54:31 INFO JobScheduler: Total delay: 1.028 s for time 1460625870000 ms (execution: 0.058 s)
16/04/14 14:54:31 INFO FileInputDStream: Cleared 0 old files that were older than 1460625810000 ms:
16/04/14 14:54:31 INFO ReceivedBlockTracker: Deleting batches ArrayBuffer()
16/04/14 14:54:31 INFO ReceiverTracker: Cleanup old received batch data: 1460625810000 ms
16/04/14 14:54:31 INFO InputInfoTracker: remove old batch metadata:
16/04/14 14:54:40 INFO FileInputDStream: Finding new files took 0 ms
16/04/14 14:54:40 INFO FileInputDStream: New files at time 1460625880000 ms:
16/04/14 14:54:40 INFO JobScheduler: Added jobs for time 1460625880000 ms
16/04/14 14:54:40 INFO JobScheduler: Starting job streaming job 1460625880000 ms.0 from job set of time 1460625880000 ms
16/04/14 14:54:40 INFO SparkContext: Starting job: runJob at DStreamFunctions.scala:54
16/04/14 14:54:40 INFO DAGScheduler: Job 1 finished: runJob at DStreamFunctions.scala:54, took 0.000018 s
16/04/14 14:54:40 INFO JobScheduler: Finished job streaming job 1460625880000 ms.0 from job set of time 1460625880000 ms
16/04/14 14:54:40 INFO JobScheduler: Total delay: 0.022 s for time 1460625880000 ms (execution: 0.010 s)
16/04/14 14:54:40 INFO MapPartitionsRDD: Removing RDD 2 from persistence list
16/04/14 14:54:40 INFO MapPartitionsRDD: Removing RDD 1 from persistence list
16/04/14 14:54:40 INFO BlockManager: Removing RDD 2
16/04/14 14:54:40 INFO FileInputDStream: Cleared 0 old files that were older than 1460625820000 ms:
16/04/14 14:54:40 INFO BlockManager: Removing RDD 1
16/04/14 14:54:40 INFO ReceivedBlockTracker: Deleting batches ArrayBuffer()
16/04/14 14:54:40 INFO ReceiverTracker: Cleanup old received batch data: 1460625820000 ms
16/04/14 14:54:40 INFO InputInfoTracker: remove old batch metadata:
16/04/14 14:54:41 INFO CassandraConnector: Disconnected from Cassandra cluster: Test Cluster
16/04/14 14:54:50 INFO FileInputDStream: Finding new files took 1 ms
16/04/14 14:54:50 INFO FileInputDStream: New files at time 1460625890000 ms:
我还从Cassandara客户端验证了它,它没有返回任何数据:
CassandraSimpleClient client = new CassandraSimpleClient();
client.connect("127.0.0.1");
//Session session = cluster.connect(“Your keyspace name”);
Session session = client.getActiveCluster().connect("spatial_keyspace");
ResultSet result = session.execute("SELECT * FROM spatial_keyspace.BUSINESSPOINTS_NY_CT");
我被困在这里,火花流没有从文本文件中获取数据?需要帮忙 !!。感谢
它不适用于我,我认为它仅适用于HDFS,所以我将其更改为socket textStream(),这样工作正常。
m_JavaStreamingContext.socketTextStream("IN-6WX6152", 9090);