我正在尝试使用匿名函数将JavaRDD<String>
转换为JavaRDD<Row>
。这是我的代码:
JavaRDD<String> listData = jsc.textFile("/src/main/resources/CorrectLabels.csv");
JavaRDD<Row> jrdd = listData.map(new Function<String, Row>() {
public Row call(String record) throws Exception {
String[] fields = record.split(",");
return RowFactory.create(fields[1], fields[0].trim());
}
});
但是在这样做的时候,我收到了一个错误:
Exception in thread "main" org.apache.spark.SparkException: Task not serializable
Stack的详细信息:
Serialization stack:
- object not serializable (class: com.cpny.ml.supervised.FeatureExtractor, value: com.cpny.ml.supervised.FeatureExtractor@421056e5)
- field (class: com.cpny.ml.supervised.FeatureExtractor$1, name: this$0, type: class com.cpny.ml.supervised.FeatureExtractor)
- object (class com.cpny.ml.supervised.FeatureExtractor$1, com.cpny.ml.supervised.FeatureExtractor$1@227a47)
- field (class: org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, name: fun$1, type: interface org.apache.spark.api.java.function.Function)
- object (class org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, <function1>)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
知道我哪里错了吗?
谢谢! ķ
答案 0 :(得分:1)
您获得的异常与匿名函数无关。
FeatureExtractor
课程不是Serializable
或包含非Serializable
字段。
答案 1 :(得分:0)
谢谢@slovit ..
我之前的设置是:MainClass调用FeatureExtractor来获取JavaRDD。此课程之前不是Serializable
。在制作完之后,我不再解决问题..
但另一方面,MainClass是我提交SparkJob的起点:
./bin/spark-submit --class com.cpny.ml.supervised.MainClass --master spark://localhost:7077 /mltraining/target/mltraining-0.0.1-SNAPSHOT.jar
但MainClass
未标记为Serializable
。但是当我在MainClass
中包含匿名函数时,我没有遇到问题。 MainClass如何获得序列化但另一个类没有?