群集中的Spring-boot + Spark,纱线客户端模式

时间:2017-12-04 04:07:17

标签: apache-spark spring-boot java-8

我曾尝试使用spring-boot在spark中提供Web服务,但似乎存在一些问题。 当我在本地调试时,它运行正常。但是当我把它放在集群上并用spark-submit运行时,出了点问题。

来源列表

public List<Person> txtlist(String sqlstr) {
        JavaRDD<Person> peopleRDD = sparkSession.read().textFile("hdfs://master:9000/user/root/people.txt").javaRDD().map(line -> {
        String[] parts = line.split(",");
        Person person = new Person();
        person.setName(parts[0]);
        person.setAge(Long.parseLong(parts[1].trim()));
        return person;
    });
    Dataset<Row> peopleDF = sparkSession.createDataFrame(peopleRDD, Person.class);

    peopleDF.createOrReplaceTempView("people");
    Dataset<Person> sqlFrame = sparkSession.sql(sqlstr).as(Encoders.bean(Person.class));
    List<Person> plist = sqlFrame.collectAsList();
    return plist;
}

错误列表

17/11/24 00:35:30 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, slave3, executor 2): 
java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.fun$1 of type org.apache.spark.api.java.function.Function in instance of org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1
at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2251)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)

我尝试了一些来自互联网的解决方案,例如.setJars(),还有spark-submit,但没有用。

有人在spark中使用带有lamba的spring-boot,遇到并解决了这个问题或在群集中正确运行?

  

更新

我尝试了另一种代码。但是错误的

JavaRDD<String> lines = sparkSession.read().textFile("hdfs://master:9000/user/root/people.txt").javaRDD();
        JavaRDD<Row> peopleRDD = lines.map(new Function<String,Row>()
        {
            /**
             * 
             */
            private static final long serialVersionUID = 1L;

            @Override
            public Row call(String line) throws Exception {
                String[] splited = line.split(",");
                return RowFactory.create(splited[0],Integer.valueOf(splited[1]));
            }
        });
        List<StructField> structFields = new ArrayList<StructField>();
        structFields.add(DataTypes.createStructField("name", DataTypes.StringType, true));
        structFields.add(DataTypes.createStructField("age", DataTypes.IntegerType, true));
        StructType structType = DataTypes.createStructType(structFields);

错误代码:

cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.

这可能是Serialization问题吗?

0 个答案:

没有答案