如何在java8中使用lambda表达式,以便它可以在spark上运行

时间:2017-03-13 02:30:10

标签: apache-spark lambda java-8

我正在学习spark2.1.0,而我现在正在日食上写出火花WordCount。
我在远程计算机上运行的火花及其名称是 c1 。主人是spark://c1:7077
我的文件是在hadoop2.3.7上。输入文件位于hdfs://c1:9000/tmp/README.md

这是我的代码:

import java.util.Arrays;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;

import scala.Tuple2;

public class WordCount {

    public static void main(String[] args) {
        String input = "hdfs://c1:9000/tmp/README.md"; 
        String output = "hdfs://c1:9000/tmp/output";

        SparkConf conf = new SparkConf().setAppName("Word Count").setMaster("spark://c1:7077");
        JavaSparkContext sc = new JavaSparkContext(conf);
        JavaRDD<String> textFile = sc.textFile(input);
        JavaPairRDD<String, Integer> counts = textFile
                .flatMap(s -> Arrays.asList(s.split(" ")).iterator())
                .mapToPair(word -> new Tuple2<>(word, 1))
                .reduceByKey((a, b) -> a + b);
        counts.saveAsTextFile(output);
        sc.close();
    }
}

当我在日食上运行时,我遇到了问题。

17/03/12 22:25:50 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 192.168.122.2, executor 1): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.f$3 of type org.apache.spark.api.java.function.FlatMapFunction in instance of org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1
    at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
    at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2237)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2155)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2231)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2155)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2231)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2155)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2231)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2155)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
    at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:479)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2122)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2231)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2155)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2231)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2155)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:85)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
    at org.apache.spark.scheduler.Task.run(Task.scala:99)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

我在谷歌上搜索过这个问题。我找到了same problem。在评论中,我看到Check that the class containing the lambda expression, i.e. SparkDemo is available on the remote side.。我无法理解这句话。我只能在火花机上运行我的程序?所以我必须将我的程序复制到 c1 并在 c1 上运行程序?

所以我的问题是: 如何在我的计算机上运行我的程序而不是 c1

我认为我的问题与this question重复。获得19个支持的答案解释了该问题是如何发生的。然后我尝试使用setJars方法但问题仍然存在。

0 个答案:

没有答案