Spark流向Cassandra,而不是持之以恒

时间:2016-04-14 09:38:46

标签: apache-spark cassandra spark-streaming

我试图将火花流持续到Cassandra,这是我的代码:

JavaDStream<BusinessPointNYCT> studentFileDStream = m_JavaStreamingContext.textFileStream(new File(fileDir, "BUSINESSPOINTS_NY_CT.csv").getAbsolutePath()).map(new BusinessPointMapFunction());
    //Save it to Cassandra
    CassandraStreamingJavaUtil.javaFunctions(studentFileDStream)
    .writerBuilder("spatial_keyspace", "businesspoints_ny_ct", mapToRow(BusinessPointNYCT.class)).saveToCassandra();

我的应用程序启动时没有任何错误或警告,但数据不会持久存在于Cassandra中。根据日志,它会在存储后删除它:

16/04/14 14:54:30 INFO JobScheduler: Added jobs for time 1460625870000 ms
16/04/14 14:54:30 INFO JobScheduler: Starting job streaming job 1460625870000 ms.0 from job set of time 1460625870000 ms
16/04/14 14:54:31 INFO SparkContext: Starting job: runJob at DStreamFunctions.scala:54
16/04/14 14:54:31 INFO DAGScheduler: Job 0 finished: runJob at DStreamFunctions.scala:54, took 0.001267 s
16/04/14 14:54:31 INFO JobScheduler: Finished job streaming job 1460625870000 ms.0 from job set of time 1460625870000 ms
16/04/14 14:54:31 INFO JobScheduler: Total delay: 1.028 s for time 1460625870000 ms (execution: 0.058 s)
16/04/14 14:54:31 INFO FileInputDStream: Cleared 0 old files that were older than 1460625810000 ms: 
16/04/14 14:54:31 INFO ReceivedBlockTracker: Deleting batches ArrayBuffer()
16/04/14 14:54:31 INFO ReceiverTracker: Cleanup old received batch data: 1460625810000 ms
16/04/14 14:54:31 INFO InputInfoTracker: remove old batch metadata: 
16/04/14 14:54:40 INFO FileInputDStream: Finding new files took 0 ms
16/04/14 14:54:40 INFO FileInputDStream: New files at time 1460625880000 ms:

16/04/14 14:54:40 INFO JobScheduler: Added jobs for time 1460625880000 ms
16/04/14 14:54:40 INFO JobScheduler: Starting job streaming job 1460625880000 ms.0 from job set of time 1460625880000 ms
16/04/14 14:54:40 INFO SparkContext: Starting job: runJob at DStreamFunctions.scala:54
16/04/14 14:54:40 INFO DAGScheduler: Job 1 finished: runJob at DStreamFunctions.scala:54, took 0.000018 s
16/04/14 14:54:40 INFO JobScheduler: Finished job streaming job 1460625880000 ms.0 from job set of time 1460625880000 ms
16/04/14 14:54:40 INFO JobScheduler: Total delay: 0.022 s for time 1460625880000 ms (execution: 0.010 s)
16/04/14 14:54:40 INFO MapPartitionsRDD: Removing RDD 2 from persistence list
16/04/14 14:54:40 INFO MapPartitionsRDD: Removing RDD 1 from persistence list
16/04/14 14:54:40 INFO BlockManager: Removing RDD 2
16/04/14 14:54:40 INFO FileInputDStream: Cleared 0 old files that were older than 1460625820000 ms: 
16/04/14 14:54:40 INFO BlockManager: Removing RDD 1
16/04/14 14:54:40 INFO ReceivedBlockTracker: Deleting batches ArrayBuffer()
16/04/14 14:54:40 INFO ReceiverTracker: Cleanup old received batch data: 1460625820000 ms
16/04/14 14:54:40 INFO InputInfoTracker: remove old batch metadata: 
16/04/14 14:54:41 INFO CassandraConnector: Disconnected from Cassandra cluster: Test Cluster
16/04/14 14:54:50 INFO FileInputDStream: Finding new files took 1 ms
16/04/14 14:54:50 INFO FileInputDStream: New files at time 1460625890000 ms:

我还从Cassandara客户端验证了它,它没有返回任何数据:

          CassandraSimpleClient client = new CassandraSimpleClient();
      client.connect("127.0.0.1");
      //Session session = cluster.connect(“Your keyspace name”);
      Session session = client.getActiveCluster().connect("spatial_keyspace");

      ResultSet result = session.execute("SELECT * FROM spatial_keyspace.BUSINESSPOINTS_NY_CT");          

我被困在这里,火花流没有从文本文件中获取数据?需要帮忙 !!。感谢

它不适用于我,我认为它仅适用于HDFS,所以我将其更改为socket textStream(),这样工作正常。

m_JavaStreamingContext.socketTextStream("IN-6WX6152", 9090);

0 个答案:

没有答案