过滤火花CassandaRow每行一个RDD

时间:2018-04-09 07:28:03

标签: scala apache-spark apache-spark-sql spark-cassandra-connector

我有以下代码: -

 val rss = sc.cassandraTable("db", "table").select("id", "date", "gpsdt").where("id=? and date=? and gpsdt>? and gpsdt<?", entry(0), entry(1), entry(2) , entry(3))

    rss.foreach { records =>
      {
        println("Cassandra Row " + records.toString())
        val gpsdttime = records.get[String]("gpsdt")
        val justLess = rss.filter(row => row.get[String]("gpsdt") < gpsdttime).sortBy(row => row.get[String]("gpsdt"), false).take(1)
      }
    }

所以,我的想法是根据一些where子句从Cassandra中选择一组RDD并遍历每一行并找到它各自的前一行来重新计算一些值并更新当前行。但这会产生错误: -

org.apache.spark.SparkException: This RDD lacks a SparkContext. It could happen in the following cases: 
(1) RDD transformations and actions are NOT invoked by the driver, but inside of other transformations; for example, rdd1.map(x => rdd2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the rdd1.map transformation. For more information, see SPARK-5063.
(2) When a Spark Streaming job recovers from checkpoint, this exception will be hit if a reference to an RDD not defined by the streaming job is used in DStream operations. For more information, See SPARK-13758.
at org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$sc(RDD.scala:89) 

建议,谢谢,

1 个答案:

答案 0 :(得分:0)

异常的含义是SparkContext在驱动程序中,但是foreach中的func在执行程序中运行,所以当你运行这个工作时,它会抛出 以下例外:

org.apache.spark.SparkException: This RDD lacks a SparkContext.

你的理由是(1)驱动程序不调用RDD转换和动作,而是在其他转换中调用;例如,rdd1.map(x =&gt; rdd2.values.count()* x)无效,因为无法在rdd1.map转换内执行值转换和计数操作。有关更多信息,请参阅SPARK-5063。