火花水槽流scala foreachpartition

时间:2015-03-20 17:25:07

标签: scala apache-spark flume spark-streaming

我正试图通过火花流来处理来自flume-avro接收器的事件,并且正在按照Design Patterns for using foreachRDD的方式进行处理,但是出于什么原因,代码不会执行“不工作”的地方。 partion.size()返回1,但它就像它甚至没有迭代那个1分区。 ps我是scala noob。

  events.foreachRDD { rdd =>
if (rdd.take(1).size == 1) {
  System.out.println("**********************************WE GOT AN RDD")
  System.out.println("*******************************NUM PARTITIONS =" + rdd.partitions.size)
  val array = rdd.collect()
  array.foreach { x => 
    System.out.println("**************WORKS********************" + new String(x.event.getBody().array(),"UTF-8"))
  }
  rdd.foreachPartition { partitionItr =>
    //System.out.println("**********************************WE NEVER GET HERE " + partitionItr.size)
    //create db connection from pool
    //val connection = ConnectionPool.getConnection()
    partitionItr.foreach { item =>
      //write to db
      System.out.println("****************DOES NOT WORK******************" + new String(item.event.getBody().array(),"UTF-8"))
      //return connection to pool
      //ConnectionPool.returnConnection(connection)
    }
  }
  //rdd.count()
}else{
  System.out.println("**********************************WE GOT NOTHIN")
}

}

0 个答案:

没有答案
相关问题