我正在使用SparkStreaming从主题读取数据。我正面临一个例外。
java.io.NotSerializableException: org.apache.kafka.clients.consumer.ConsumerRecord序列化堆栈: -无法序列化的对象(类:org.apache.kafka.clients.consumer.ConsumerRecord,值: ConsumerRecord(topic = rawEventTopic,分区= 0,偏移量= 14098, CreateTime = 1556113016951,序列化的密钥大小= -1,序列化的值 大小= 2916,标头= RecordHeaders(标头= [],isReadOnly = false),键= null,值= {“ id”:null,“ message”:null,“ eventDate”:“”,“ group”:null,“ category”:“ AD”,“ userName”:null,“ inboundDataSource”:“ AD”,“ source” “:” 192.168.1.14“,”目标“:” 192.168.1.15“,”字节已发送“:” 200KB“,” rawData“:” {用户名: vinit}“,” account_name“:null,” security_id“:null,” account_domain“:null,” logon_id“:null,” process_id“:null,” process_information“:null,” process_name“:null,” target_server_name“: null,“ source_network_address”:null,“ logon_process”:null,“ authentication_Package”:null,“ network_address”:null,“ failure_reason”:null,“ workstation_name”:null,“ target_server”:null,“ network_information”:null, “ object_type”:null,“ object_name”:null,“ source_port”:null,“ logon_type”:null,“ group_name”:null,“ source_dra”:null,“ destination_dra”:null,“ group_admin”:null,“ sam_account_name” “:null,” new_logon“:null,” destination_address“:null,” destination_port“:null,” source_address“:null,” logon_account“:null,” sub_status“:null,” eventdate“:null,” time_taken“: null,“ s_computername”:null,“ cs_method”:null,“ cs_uri_stem”:null,“ cs_uri_query”:null,“ c_ip”:null,“ s_ip”:null,“ s_supplier_name”:null,“ s_sitename”:null, “ cs_username”:null,“ cs_auth_group”:null,“ cs_categories”:null,“ s_action”:null,“ cs_host”:null,“ cs_uri”:null,“ cs_uri_scheme”:null,“ cs_uri_port”:n ull,“ cs_uri_path”:null,“ cs_uri_extension”:null,“ cs_referer”:null,“ cs_user_agent”:null,“ cs_bytes”:null,“ sc_status”:null,“ sc_bytes”:null,“ sc_filter_result”:null, “ sc_filter_category”:null,“ x_virus_id”:null,“ x_exception_id”:null,“ rs_content_type”:null,“ s_supplier_ip”:null,“ cs_cookie”:null,“ s_port”:null,“ cs_version”:null,“创建时间“:null,” operation“:null,” workload“:null,” clientIP“:null,” userId“:null,” eventSource“:null,” itemType“:null,” userAgent“:null,” eventData“: null,“ sourceFileName”:null,“ siteUrl”:null,“ targetUserOrGroupType”:null,“ targetUserOrGroupName”:null,“ sourceFileExtension”:null,“ sourceRelativeUrl”:null,“ resultStatus”:null,“ client”:null, “ loginStatus”:null,“ userDomain”:null,“ clientIPAddress”:null,“ clientProcessName”:null,“ clientVersion”:null,“ externalAccess”:null,“ logonType”:null,“ mailboxOwnerUPN”:null,“ organizationName “:null,” originatingServer“:null,” subject“:null,” sendAsUserSmtp“:null,” deviceexternalid“:null,” deviceeventcategory“:null,” devicecustomstring1“:null,” customnumber2“:n ull,“ customnumber1”:null,“ emailsender”:null,“ sourceusername”:null,“ sourceaddress”:null,“ emailrecipient”:null,“ destinationaddress”:null,“ destinationport”:null,“ requestclientapplication”:null, “ oldfilepath”:null,“ filepath”:null,“ additionaldetails11”:null,“ applicationprotocol”:null,“ emailrecipienttype”:null,“ emailsubject”:null,“ transactionstring1”:null,“ deviceaction”:null,“ devicecustomdate2” “:null,” devicecustomdate1“:null,” sourcehostname“:null,” additionaldetails10“:null,” filename“:null,” bytesout“:null,” additionaldetails13“:null,” additionaldetails14“:null,”帐户名“: null,“ destinationhostname”:null,“ dataSourceId”:2,“ date”:“”,“ violated”:false,“ oobjectId”:null,“ eventCategoryName”:“ AD”,“ sourceDataType”:“ AD”}) ) -数组元素(索引:0) -数组(类[Lorg.apache.kafka.clients.consumer.ConsumerRecord ;,大小1) org.apache.spark.serializer.SerializationDebugger $ .improveException(SerializationDebugger.scala:40) 〜[spark-core_2.11-2.3.0.jar:2.3.0]在 org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46) 〜[spark-core_2.11-2.3.0.jar:2.3.0]在 org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100) 〜[spark-core_2.11-2.3.0.jar:2.3.0]在 org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:393) 〜[spark-core_2.11-2.3.0.jar:2.3.0]在 java.util.concurrent.ThreadPoolExecutor.runWorker(未知来源) [na:1.8.0_151]在 java.util.concurrent.ThreadPoolExecutor $ Worker.run(未知源) [na:1.8.0_151]位于java.lang.Thread.run(未知来源)[na:1.8.0_151]
2019-04-24 19:07:00.025错误21144-[result-getter-1] o.apache.spark.scheduler.TaskSetManager:阶段48.0中的任务1.0(TID 97)的结果无法序列化: org.apache.kafka.clients.consumer.ConsumerRecord
用于读取主题数据的代码如下-
@Service
public class RawEventSparkConsumer {
private final Logger logger = LoggerFactory.getLogger(RawEventSparkConsumer.class);
@Autowired
private DataModelServiceImpl dataModelServiceImpl;
@Autowired
private JavaStreamingContext streamingContext;
@Autowired
private JavaInputDStream<ConsumerRecord<String, String>> messages;
@Autowired
private EnrichEventKafkaProducer enrichEventKafkaProd;
@PostConstruct
private void sparkRawEventConsumer() {
ExecutorService executor = Executors.newSingleThreadExecutor();
executor.execute(() -> {
messages.foreachRDD((rdd) -> {
List<ConsumerRecord<String, String>> rddList = rdd.collect();
Iterator<ConsumerRecord<String, String>> rddIterator = rddList.iterator();
while (rddIterator.hasNext()) {
ConsumerRecord<String, String> rddRecord = rddIterator.next();
if (rddRecord.topic().toString().equalsIgnoreCase("rawEventTopic")) {
ObjectMapper mapper = new ObjectMapper();
BaseDataModel csvDataModel = mapper.readValue(rddRecord.value(), BaseDataModel.class);
EnrichEventDataModel enrichEventDataModel = (EnrichEventDataModel) csvDataModel;
enrichEventKafkaProd.sendEnrichEvent(enrichEventDataModel);
} else if (rddRecord.topic().toString().equalsIgnoreCase("enrichEventTopic")) {
System.out.println("************getting enrichEventTopic data ************************");
}
}
});
streamingContext.start();
try {
streamingContext.awaitTermination();
} catch (InterruptedException e) { // TODO Auto-generated catch block
e.printStackTrace();
}
});
}
这是配置代码。
@Bean
public JavaInputDStream<ConsumerRecord<String, String>> getKafkaParam(JavaStreamingContext streamingContext) {
Map<String, Object> kafkaParams = new HashedMap();
kafkaParams.put("bootstrap.servers", "localhost:9092");
kafkaParams.put("key.deserializer", StringDeserializer.class);
kafkaParams.put("value.deserializer", StringDeserializer.class);
kafkaParams.put("group.id", "group1");
kafkaParams.put("auto.offset.reset", "latest");
kafkaParams.put("enable.auto.commit", false);
Collection<String> topics = Arrays.asList(rawEventTopic,enrichEventTopic);
return KafkaUtils.createDirectStream(
streamingContext,
LocationStrategies.PreferConsistent(),
ConsumerStrategies.<String, String>Subscribe(topics, kafkaParams)
);
}
请帮助。我被困在这一点上。
答案 0 :(得分:0)
在下面的链接中找到了我的问题的解决方案-
org.apache.spark.SparkException: Task not serializable
将内部类声明为静态变量:
static Function<Tuple2<String, String>, String> mapFunc=new Function<Tuple2<String, String>, String>() {
@Override
public String call(Tuple2<String, String> tuple2) {
return tuple2._2();
}
}