我是 spark 流媒体编程的新手,请有人向我解释有什么问题 我认为我迭代一个空结构,但我有一个正常工作的生产者类 我的源代码:
public class Main3 implements java.io.Serializable {
public static JavaDStream<Double> pr;
public void consumer() throws Exception{
// Configure Spark to connect to Kafka running on local machine
Map<String, Object> kafkaParams = new HashMap<>();
kafkaParams.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"localhost:9092");
kafkaParams.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
"org.apache.kafka.common.serialization.StringDeserializer");
kafkaParams.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
"org.apache.kafka.common.serialization.StringDeserializer");
kafkaParams.put(ConsumerConfig.GROUP_ID_CONFIG,"group1");
kafkaParams.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"latest");
kafkaParams.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,true);
Collection<String> topics = Arrays.asList("testing");
SparkConf conf = new SparkConf().setMaster("local[*]").setAppName("SparkKafka10WordCount");
JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(30));
final JavaInputDStream<ConsumerRecord<String, String>> receiver=
KafkaUtils.createDirectStream(jssc, LocationStrategies.PreferConsistent(),
ConsumerStrategies.<String,String>Subscribe(topics,kafkaParams));
JavaDStream<String> stream = receiver.map(new Function<ConsumerRecord<String,String>, String>() {
@Override
public String call(ConsumerRecord<String, String> kafkaRecord) throws Exception {
return kafkaRecord.value();
}
});
stream.foreachRDD( x->x.saveAsTextFile("/home/khouloud/Desktop/exemple/b")); //that does no do any thing
stream.foreachRDD( x-> {
x.collect().stream().forEach(n-> System.out.println("item of list: "+n));
}); // also this i see any thing in the console
stream.foreachRDD( rdd -> {
if (rdd.isEmpty()) System.out.println("its empty"); }); //nothing`
JavaPairDStream<Integer, List<Double>> points= stream.mapToPair(new PairFunction<String, Integer, List<Double>>(){
@Override
public Tuple2<Integer, List<Double>> call(String x) throws Exception {
String[] item = x.split(" ");
List<Double> l = new ArrayList<Double>();
for (int i= 1 ; i < item.length ; i++)
{
l.add(new Double(item[i]));
}
return new Tuple2<>(new Integer(item[0]), l);
}}
);`
错误 -
`org.apache.spark.SparkException:任务不可序列化 org.apache.spark.util.ClosureCleaner $ .ensureSerializable(ClosureCleaner.scala:340) 在 org.apache.spark.util.ClosureCleaner $ .ORG $阿帕奇$火花$ UTIL $ ClosureCleaner $$干净(ClosureCleaner.scala:330) 在 org.apache.spark.util.ClosureCleaner $清洁机壳(ClosureCleaner.scala:156) 在org.apache.spark.SparkContext.clean(SparkContext.scala:2294)at org.apache.spark.streaming.dstream.DStream $$ anonfun $表$ 1.适用(DStream.scala:547) 在 org.apache.spark.streaming.dstream.DStream $$ anonfun $表$ 1.适用(DStream.scala:547) 在 org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:151) 在 org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:112) 在org.apache.spark.SparkContext.withScope(SparkContext.scala:701) 在 org.apache.spark.streaming.StreamingContext.withScope(StreamingContext.scala:265) 在org.apache.spark.streaming.dstream.DStream.map(DStream.scala:546) 在 org.apache.spark.streaming.api.java.JavaDStreamLike $ class.mapToPair(JavaDStreamLike.scala:163) 在 org.apache.spark.streaming.api.java.AbstractJavaDStreamLike.mapToPair(JavaDStreamLike.scala:42) 在Min.calculDegSim(Min.java:43)在SkyRule.execute(SkyRule.java:34) 在Main3.consumer(Main3.java:159)at Executer $ 2.run(Executer.java:27)at java.lang.Thread.run(Thread.java:748)引起: java.io.NotSerializableException:当图表意外为null时 DStream正在被序列化。序列化堆栈:
在 org.apache.spark.serializer.SerializationDebugger $ .improveException(SerializationDebugger.scala:40) 在 org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46) 在 org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100) 在 org.apache.spark.util.ClosureCleaner $ .ensureSerializable(ClosureCleaner.scala:337)