我正在尝试使用Kafka通过spark streaming API制作和使用Avro消息。但Avro抛出的对象不是可序列化的异常。我尝试使用AvroKey包装器包装数据。它仍然无法正常工作。
制片人代码:
public static final String schema = "{"
+"\"fields\": ["
+ " { \"name\": \"str1\", \"type\": \"string\" },"
+ " { \"name\": \"str2\", \"type\": \"string\" },"
+ " { \"name\": \"int1\", \"type\": \"int\" }"
+"],"
+"\"name\": \"myrecord\","
+"\"type\": \"record\""
+"}";
public static void startAvroProducer() throws InterruptedException, IOException{
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,"localhost:9092");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, ByteArraySerializer.class);
props.put(ProducerConfig.CLIENT_ID_CONFIG, "Kafka Avro Producer");
Schema.Parser parser = new Schema.Parser();
Schema schema = parser.parse(AvroProducer.schema);
AvroKey<GenericRecord> k = new AvroKey<GenericRecord>();
GenericRecord datum = new GenericData.Record(schema);
datum.put("str1","phani");
datum.put("str2", "kumar");
datum.put("int1", 1);
k.datum(datum);
GenericDatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);
ByteArrayOutputStream os = new ByteArrayOutputStream();
Encoder e = EncoderFactory.get().binaryEncoder(os, null);
writer.write(k.datum(), e);
e.flush();
byte[] bytedata = os.toByteArray();
KafkaProducer<String,byte[]> producer = new KafkaProducer<String,byte[]>(props);
ProducerRecord<String,byte[]> producerRec = new ProducerRecord<String, byte[]>("jason", bytedata);
producer.send(producerRec);
producer.close();
}
消费者代码:
private static SparkConf sc = null;
private static JavaSparkContext jsc = null;
private static JavaStreamingContext jssc = null;
public static void startAvroConsumer() throws InterruptedException {
sc = new SparkConf().setAppName("Spark Avro Streaming Consumer")
.setMaster("local[*]");
jsc = new JavaSparkContext(sc);
jssc = new JavaStreamingContext(jsc, new Duration(200));
Schema.Parser parser = new Schema.Parser();
Schema schema = parser.parse(AvroProducer.schema);
Set<String> topics = Collections.singleton("jason");
Map<String, String> kafkaParams = new HashMap<String, String>();
kafkaParams.put("metadata.broker.list", "localhost:9092");
kafkaParams.put("key.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
kafkaParams.put("value.deserializer",
"org.apache.kafka.common.serialization.ByteArrayDeserializer");
JavaPairInputDStream<String, byte[]> inputDstream = KafkaUtils
.createDirectStream(jssc, String.class, byte[].class,
StringDecoder.class, DefaultDecoder.class, kafkaParams,
topics);
GenericDatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(
schema);
inputDstream.map(message -> {
ByteArrayInputStream bis = new ByteArrayInputStream(message._2);
Decoder decoder = DecoderFactory.get().binaryDecoder(bis, null);
GenericRecord record = reader.read(null, decoder);
String str1 = getValue(record, "str1", String.class);
String str2 = getValue(record, "str2", String.class);
int int1 = getValue(record, "int1", Integer.class);
return str1 + " " + str2 + " " + int1;
}).print();;
jssc.start();
jssc.awaitTermination();
}
@SuppressWarnings("unchecked")
public static <T> T getValue(GenericRecord genericRecord, String name,
Class<T> clazz) {
Object obj = genericRecord.get(name);
if (obj == null)
return null;
if (obj.getClass() == Utf8.class) {
return (T) obj.toString();
}
if (obj.getClass() == Integer.class) {
return (T) obj;
}
return null;
}
例外:
Caused by: java.io.NotSerializableException: org.apache.avro.generic.GenericDatumReader
Serialization stack:
- object not serializable (class: org.apache.avro.generic.GenericDatumReader, value: org.apache.avro.generic.GenericDatumReader@7da8db47)
- element of array (index: 0)
- array (class [Ljava.lang.Object;, size 1)
- field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, type: class [Ljava.lang.Object;)
- object (class java.lang.invoke.SerializedLambda, SerializedLambda[capturingClass=class com.applications.streaming.consumers.AvroConsumer, functionalInterfaceMethod=org/apache/spark/api/java/function/Function.call:(Ljava/lang/Object;)Ljava/lang/Object;, implementation=invokeStatic com/applications/streaming/consumers/AvroConsumer.lambda$0:(Lorg/apache/avro/generic/GenericDatumReader;Lscala/Tuple2;)Ljava/lang/String;, instantiatedMethodType=(Lscala/Tuple2;)Ljava/lang/String;, numCaptured=1])
- writeReplace data (class: java.lang.invoke.SerializedLambda)
- object (class com.applications.streaming.consumers.AvroConsumer$$Lambda$13/1805404637, com.applications.streaming.consumers.AvroConsumer$$Lambda$13/1805404637@aa31e58)
- field (class: org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, name: fun$1, type: interface org.apache.spark.api.java.function.Function)
- object (class org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, <function1>)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
... 15 more
在阅读各种博客时,我理解的是Avro对象没有实现可序列化的界面。但是,根据下面的jira
https://issues.apache.org/jira/browse/AVRO-1502
问题已经解决。我仍然遇到这个问题。
是否可以解决此问题。
答案 0 :(得分:1)
你的问题是你正在引用lambda中的以下对象 功能
GenericDatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(
schema);
GenericDatumReader
不可序列化。你有2个选择。在地图功能中移动对象的实例化(不是一个好的选项)或将此对象移动为类的静态成员。这将强制为每个执行器创建一个新对象(每个jvm 1个)。考虑到您使用的是预编译模式,您可以在静态块中轻松创建实例。喜欢这个
static GenericDatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(new Schema.Parser().parse(AvroProducer.schema));
或
static GenericDatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(AvroProducer.$SCHEMA);
答案 1 :(得分:0)
来自消费者代码:
kafkaParams.put("key.deserializer",
"org.apache.kafka.common.serialization.StringSerializer");
kafkaParams.put("value.deserializer",
"org.apache.kafka.common.serialization.ByteArraySerializer");
你可以看到你已经为反序列化键设置了序列化器类。
Deserializers to be used : ByteArrayDeserializer, StringDeserializer
一般性评论:关于kafka的Avro数据需要使用某种模式注册服务来实现,因为avro模式可以随着时间的推移而发展。