需要在spark
中执行以下连接操作JavaPairRDD<String, Tuple2<Optional<MarkToMarketPNL>, Optional<MarkToMarketPNL>>> finalMTMPNLRDD = openMTMPNL.fullOuterJoin(closedMTMPNL);
要执行此操作,我需要两个JavaPairRDD,它们是closedMTMPNL和openMTMPNL。 OpenMTM和closeMTM工作正常,但两个RDD上的keyBy都在运行时给出错误。
JavaPairRDD<String,MarkToMarketPNL> openMTMPNL = openMTM.keyBy(new Function<MarkToMarketPNL,String>(){
public String call(MarkToMarketPNL mtm) throws Exception
{
return mtm.getTaxlot();
}
});
JavaPairRDD<String,MarkToMarketPNL> closedMTMPNL = closedMTM.keyBy(new Function<MarkToMarketPNL,String>(){
public String call(MarkToMarketPNL mtm) throws Exception
{
return mtm.getTaxlot();
}
});
还有其他方式我可以加入openMTM和closeMTM RDD吗?截至目前,试图获得两个可以在String上执行连接的RDD。是什么导致异常发生?
附加堆栈跟踪
java.lang.NullPointerException
15/06/28 01:19:30 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
java.lang.NullPointerException
at scala.collection.convert.Wrappers$JIterableWrapper.iterator(Wrappers.scala:53)
at scala.collection.IterableLike$class.toIterator(IterableLike.scala:89)
at scala.collection.AbstractIterable.toIterator(Iterable.scala:54)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1626)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1095)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1095)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
15/06/28 01:19:30 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, localhost): java.lang.NullPointerException
at scala.collection.convert.Wrappers$JIterableWrapper.iterator(Wrappers.scala:53)
at scala.collection.IterableLike$class.toIterator(IterableLike.scala:89)
at scala.collection.AbstractIterable.toIterator(Iterable.scala:54)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1626)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1095)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1095)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
答案 0 :(得分:1)
此异常是由于从您的某个函数返回null值。您可以返回null并在该过滤器之后返回null元组,例如:
JavaPairRDD<String,MarkToMarketPNL> openMTMPNL = openMTM.keyBy(new Function<MarkToMarketPNL,String>(){
public String call(MarkToMarketPNL mtm) throws Exception
{
return mtm.getTaxlot();
}
}).filter(new Function<Tuple2<String, MarkToMarketPNL>, Boolean>() {
@Override
public Boolean call(Tuple2<String, MarkToMarketPNL> arg) throws Exception {
return arg == null ? false : true;
}
});
答案 1 :(得分:0)
我认为错误不在您包含在问题中的代码中。 Spark正试图在RDD上运行count
。您提供的代码不会调用count
,因此这是一个符号。但是这个例外表明,计算的RDD有一个用Java创建的迭代器,现在正在转换为Scala迭代器。在那一点上,事实证明这个迭代器实际上是null
。
您的代码是否在某处生成迭代器?也许在mapPartitions
电话或其他一些电话中?
答案 2 :(得分:0)
我遇到了同样的问题。当在内部执行连接操作时&lt; key,Iterable&lt; values&gt;&gt;得到了。如果其中一个Iterable&lt; values&gt;对象为null,我们看到如上所述的空指针异常。
在执行连接之前,请确保没有值为null。