Question

我＆＃39; m使用union类的RDD方法遇到一种非常奇怪的行为。我无法理解为什么会这样。

我有一个班级，其中有data和filteredData个班组。第一个是从textFile和其他map和filter初始化的，第二个是sc.emptyRDD[Point]。然后我有一个方法就是这个：

do{
  /**
  * here val lastFiltered is computed
  */

  logger.debug("Filtered "+lastFiltered.count()+" points")

  filteredData=filteredData.union(lastFiltered)

  logger.debug("Filtered so far "+filteredData.count()+" points")

  data = data.subtract(lastFiltered)

  /** here data is repartitioned **/
}while(/** here there is a condition which is equivalent to lastFiltered.count() == 0 **/)
logger.info("Preprocessing has filtered "+filteredData.count()+" points")

我从记录器中得到的东西对我来说非常奇怪：

 Filtered 13 points
 Filtered so far 13 points
 Filtered 4 points
 Filtered so far 4834 points
 Filtered 0 points
 Filtered so far 0 points
 Preprocessing has filtered 0 points

当然前两行完全符合我的预期......但后来一切对我来说都很奇怪。此外，subtract RDD上的data方法似乎工作正常（计数是预期的）。

任何人都可以帮助我了解正在发生的事情吗？

谢谢！马可

Spark联盟奇怪的行为

0 个答案: