迭代MapPartitionsRDD - Scala

时间:2017-09-20 01:41:21

标签: scala apache-spark rdd

我正在浏览其他人的Scala代码,我无法迭代RDD。我只是想打印它的内容。

val neighborRDD : RDD[(Long, Array[(Row, Double)])]

这是我想看到的RDD。人们在其他问题中提出的建议 - neighborRDD.foreach(println) - 无论是否有.collect()都无效。 对此有何帮助?

已编辑:当我执行邻居RDD.foreach(println)

时添加错误消息
  

Traceback(最近一次调用最后一次):文件   " /home/yhkwon/Desktop/knn/spark-knn/python/test.py" ;,第33行,在          predictions = model.transform(test)File" /usr/local/spark/python/lib/pyspark.zip/pyspark/ml/base.py" ;, line   105,在转换文件中   " /usr/local/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py" ;, line   281,在_transform文件中   " /usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py" ;,   第1133行,在调用文件中   " /usr/local/spark/python/lib/pyspark.zip/pyspark/sql/utils.py" ;, line   63,在deco文件中   " /usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py" ;,   第319行,在get_return_value py4j.protocol.Py4JJavaError:错误   调用o99.transform时发生。 :   org.apache.spark.SparkException:作业因阶段失败而中止:   阶段13.0中的任务0失败4次,最近失败:丢失任务   阶段13.0中的0.3(TID 32,192.168.0.18,执行程序0):java.lang.ClassCastException:java.lang.Long无法强制转换为   org.apache.spark.sql.Row at   org.apache.spark.ml.classification.KNNClassificationModel $$ anonfun $ $变换1.适用(KNNClassifier.scala:183)     在   org.apache.spark.ml.classification.KNNClassificationModel $$ anonfun $ $变换1.适用(KNNClassifier.scala:183)     在scala.collection.Iterator $ class.foreach(Iterator.scala:893)at   scala.collection.AbstractIterator.foreach(Iterator.scala:1336)at   org.apache.spark.rdd.RDD $$ anonfun $ $的foreach 1 $$ anonfun $ $申请28.apply(RDD.scala:918)     在   org.apache.spark.rdd.RDD $$ anonfun $ $的foreach 1 $$ anonfun $ $申请28.apply(RDD.scala:918)     在   org.apache.spark.SparkContext $$ anonfun $ runJob $ 5.apply(SparkContext.scala:2062)     在   org.apache.spark.SparkContext $$ anonfun $ runJob $ 5.apply(SparkContext.scala:2062)     在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)     在org.apache.spark.scheduler.Task.run(Task.scala:108)at   org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:335)     在   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)     在   java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(ThreadPoolExecutor.java:624)     在java.lang.Thread.run(Thread.java:748)

     

驱动程序堆栈跟踪:at   org.apache.spark.scheduler.DAGScheduler.org $阿帕奇$火花$ $调度$$ DAGScheduler failJobAndIndependentStages(DAGScheduler.scala:1499)     在   org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.适用(DAGScheduler.scala:1487)     在   org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.适用(DAGScheduler.scala:1486)     在   scala.collection.mutable.ResizableArray $ class.foreach(ResizableArray.scala:59)     在scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)     在   org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)     在   org.apache.spark.scheduler.DAGScheduler $$ anonfun $ handleTaskSetFailed $ 1.适用(DAGScheduler.scala:814)     在   org.apache.spark.scheduler.DAGScheduler $$ anonfun $ handleTaskSetFailed $ 1.适用(DAGScheduler.scala:814)     在scala.Option.foreach(Option.scala:257)at   org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)     在   org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)     在   org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)     在   org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)     在org.apache.spark.util.EventLoop $$ anon $ 1.run(EventLoop.scala:48)     在   org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)     在org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)at at   org.apache.spark.SparkContext.runJob(SparkContext.scala:2043)at at   org.apache.spark.SparkContext.runJob(SparkContext.scala:2062)at at   org.apache.spark.SparkContext.runJob(SparkContext.scala:2087)at at   org.apache.spark.rdd.RDD $$ anonfun $ foreach $ 1.apply(RDD.scala:918)at at   org.apache.spark.rdd.RDD $$ anonfun $ foreach $ 1.apply(RDD.scala:916)at at   org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:151)     在   org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:112)     在org.apache.spark.rdd.RDD.withScope(RDD.scala:362)at   org.apache.spark.rdd.RDD.foreach(RDD.scala:916)at   org.apache.spark.ml.classification.KNNClassificationModel.transform(KNNClassifier.scala:164)     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at   sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)     在   sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)     在java.lang.reflect.Method.invoke(Method.java:498)at   py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)at at   py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)at   py4j.Gateway.invoke(Gateway.java:280)at   py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)     在py4j.commands.CallCommand.execute(CallCommand.java:79)at   py4j.GatewayConnection.run(GatewayConnection.java:214)at   java.lang.Thread.run(Thread.java:748)引起:   java.lang.ClassCastException:java.lang.Long无法强制转换为   org.apache.spark.sql.Row at   org.apache.spark.ml.classification.KNNClassificationModel $$ anonfun $ $变换1.适用(KNNClassifier.scala:183)     在   org.apache.spark.ml.classification.KNNClassificationModel $$ anonfun $ $变换1.适用(KNNClassifier.scala:183)     在scala.collection.Iterator $ class.foreach(Iterator.scala:893)at   scala.collection.AbstractIterator.foreach(Iterator.scala:1336)at   org.apache.spark.rdd.RDD $$ anonfun $ $的foreach 1 $$ anonfun $ $申请28.apply(RDD.scala:918)     在   org.apache.spark.rdd.RDD $$ anonfun $ $的foreach 1 $$ anonfun $ $申请28.apply(RDD.scala:918)     在   org.apache.spark.SparkContext $$ anonfun $ runJob $ 5.apply(SparkContext.scala:2062)     在   org.apache.spark.SparkContext $$ anonfun $ runJob $ 5.apply(SparkContext.scala:2062)     在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)     在org.apache.spark.scheduler.Task.run(Task.scala:108)at   org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:335)     在   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)     在   java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(ThreadPoolExecutor.java:624)     ......还有1个

0 个答案:

没有答案