我曾尝试使用spring-boot在spark中提供Web服务,但似乎存在一些问题。
当我在本地调试时,它运行正常。但是当我把它放在集群上并用spark-submit
运行时,出了点问题。
来源列表
public List<Person> txtlist(String sqlstr) {
JavaRDD<Person> peopleRDD = sparkSession.read().textFile("hdfs://master:9000/user/root/people.txt").javaRDD().map(line -> {
String[] parts = line.split(",");
Person person = new Person();
person.setName(parts[0]);
person.setAge(Long.parseLong(parts[1].trim()));
return person;
});
Dataset<Row> peopleDF = sparkSession.createDataFrame(peopleRDD, Person.class);
peopleDF.createOrReplaceTempView("people");
Dataset<Person> sqlFrame = sparkSession.sql(sqlstr).as(Encoders.bean(Person.class));
List<Person> plist = sqlFrame.collectAsList();
return plist;
}
错误列表
17/11/24 00:35:30 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, slave3, executor 2):
java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.fun$1 of type org.apache.spark.api.java.function.Function in instance of org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1
at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2251)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
我尝试了一些来自互联网的解决方案,例如.setJars()
,还有spark-submit
,但没有用。
有人在spark中使用带有lamba
的spring-boot,遇到并解决了这个问题或在群集中正确运行?
更新
我尝试了另一种代码。但是错误的
JavaRDD<String> lines = sparkSession.read().textFile("hdfs://master:9000/user/root/people.txt").javaRDD();
JavaRDD<Row> peopleRDD = lines.map(new Function<String,Row>()
{
/**
*
*/
private static final long serialVersionUID = 1L;
@Override
public Row call(String line) throws Exception {
String[] splited = line.split(",");
return RowFactory.create(splited[0],Integer.valueOf(splited[1]));
}
});
List<StructField> structFields = new ArrayList<StructField>();
structFields.add(DataTypes.createStructField("name", DataTypes.StringType, true));
structFields.add(DataTypes.createStructField("age", DataTypes.IntegerType, true));
StructType structType = DataTypes.createStructType(structFields);
错误代码:
cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.
这可能是Serialization
问题吗?