收到NotSerializableException-与Kafka一起使用Spark流式传输时

时间:2019-04-24 13:47:05

标签: java apache-spark apache-kafka spark-streaming

我正在使用SparkStreaming从主题读取数据。我正面临一个例外。

  

java.io.NotSerializableException:   org.apache.kafka.clients.consumer.ConsumerRecord序列化堆栈:     -无法序列化的对象(类:org.apache.kafka.clients.consumer.ConsumerRecord,值:   ConsumerRecord(topic = rawEventTopic,分区= 0,偏移量= 14098,   CreateTime = 1556113016951,序列化的密钥大小= -1,序列化的值   大小= 2916,标头= RecordHeaders(标头= [],isReadOnly =   false),键= null,值=   {“ id”:null,“ message”:null,“ eventDate”:“”,“ group”:null,“ category”:“ AD”,“ userName”:null,“ inboundDataSource”:“ AD”,“ source” “:” 192.168.1.14“,”目标“:” 192.168.1.15“,”字节已发送“:” 200KB“,” rawData“:” {用户名:   vinit}“,” account_name“:null,” security_id“:null,” account_domain“:null,” logon_id“:null,” process_id“:null,” process_information“:null,” process_name“:null,” target_server_name“: null,“ source_network_address”:null,“ logon_process”:null,“ authentication_Package”:null,“ network_address”:null,“ failure_reason”:null,“ workstation_name”:null,“ target_server”:null,“ network_information”:null, “ object_type”:null,“ object_name”:null,“ source_port”:null,“ logon_type”:null,“ group_name”:null,“ source_dra”:null,“ destination_dra”:null,“ group_admin”:null,“ sam_account_name” “:null,” new_logon“:null,” destination_address“:null,” destination_port“:null,” source_address“:null,” logon_account“:null,” sub_status“:null,” eventdate“:null,” time_taken“: null,“ s_computername”:null,“ cs_method”:null,“ cs_uri_stem”:null,“ cs_uri_query”:null,“ c_ip”:null,“ s_ip”:null,“ s_supplier_name”:null,“ s_sitename”:null, “ cs_username”:null,“ cs_auth_group”:null,“ cs_categories”:null,“ s_action”:null,“ cs_host”:null,“ cs_uri”:null,“ cs_uri_scheme”:null,“ cs_uri_port”:n ull,“ cs_uri_path”:null,“ cs_uri_extension”:null,“ cs_referer”:null,“ cs_user_agent”:null,“ cs_bytes”:null,“ sc_status”:null,“ sc_bytes”:null,“ sc_filter_result”:null, “ sc_filter_category”:null,“ x_virus_id”:null,“ x_exception_id”:null,“ rs_content_type”:null,“ s_supplier_ip”:null,“ cs_cookie”:null,“ s_port”:null,“ cs_version”:null,“创建时间“:null,” operation“:null,” workload“:null,” clientIP“:null,” userId“:null,” eventSource“:null,” itemType“:null,” userAgent“:null,” eventData“: null,“ sourceFileName”:null,“ siteUrl”:null,“ targetUserOrGroupType”:null,“ targetUserOrGroupName”:null,“ sourceFileExtension”:null,“ sourceRelativeUrl”:null,“ resultStatus”:null,“ client”:null, “ loginStatus”:null,“ userDomain”:null,“ clientIPAddress”:null,“ clientProcessName”:null,“ clientVersion”:null,“ externalAccess”:null,“ logonType”:null,“ mailboxOwnerUPN”:null,“ organizationName “:null,” originatingServer“:null,” subject“:null,” sendAsUserSmtp“:null,” deviceexternalid“:null,” deviceeventcategory“:null,” devicecustomstring1“:null,” customnumber2“:n ull,“ customnumber1”:null,“ emailsender”:null,“ sourceusername”:null,“ sourceaddress”:null,“ emailrecipient”:null,“ destinationaddress”:null,“ destinationport”:null,“ requestclientapplication”:null, “ oldfilepath”:null,“ filepath”:null,“ additionaldetails11”:null,“ applicationprotocol”:null,“ emailrecipienttype”:null,“ emailsubject”:null,“ transactionstring1”:null,“ deviceaction”:null,“ devicecustomdate2” “:null,” devicecustomdate1“:null,” sourcehostname“:null,” additionaldetails10“:null,” filename“:null,” bytesout“:null,” additionaldetails13“:null,” additionaldetails14“:null,”帐户名“: null,“ destinationhostname”:null,“ dataSourceId”:2,“ date”:“”,“ violated”:false,“ oobjectId”:null,“ eventCategoryName”:“ AD”,“ sourceDataType”:“ AD”}) )     -数组元素(索引:0)     -数组(类[Lorg.apache.kafka.clients.consumer.ConsumerRecord ;,大小1)   org.apache.spark.serializer.SerializationDebugger $ .improveException(SerializationDebugger.scala:40)   〜[spark-core_2.11-2.3.0.jar:2.3.0]在   org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)   〜[spark-core_2.11-2.3.0.jar:2.3.0]在   org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)   〜[spark-core_2.11-2.3.0.jar:2.3.0]在   org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:393)   〜[spark-core_2.11-2.3.0.jar:2.3.0]在   java.util.concurrent.ThreadPoolExecutor.runWorker(未知来源)   [na:1.8.0_151]在   java.util.concurrent.ThreadPoolExecutor $ Worker.run(未知源)   [na:1.8.0_151]位于java.lang.Thread.run(未知来源)[na:1.8.0_151]

     

2019-04-24 19:07:00.025错误21144-[result-getter-1]   o.apache.spark.scheduler.TaskSetManager:阶段48.0中的任务1.0(TID   97)的结果无法序列​​化:   org.apache.kafka.clients.consumer.ConsumerRecord

用于读取主题数据的代码如下-

 @Service
public class RawEventSparkConsumer {
    private final Logger logger = LoggerFactory.getLogger(RawEventSparkConsumer.class);

    @Autowired
    private DataModelServiceImpl dataModelServiceImpl;

    @Autowired
    private JavaStreamingContext streamingContext;

    @Autowired
    private JavaInputDStream<ConsumerRecord<String, String>> messages;

    @Autowired
    private EnrichEventKafkaProducer enrichEventKafkaProd;

    @PostConstruct
    private void sparkRawEventConsumer() {

        ExecutorService executor = Executors.newSingleThreadExecutor();
        executor.execute(() -> {

            messages.foreachRDD((rdd) -> {

                List<ConsumerRecord<String, String>> rddList = rdd.collect();
                Iterator<ConsumerRecord<String, String>> rddIterator = rddList.iterator();
                while (rddIterator.hasNext()) {
                    ConsumerRecord<String, String> rddRecord = rddIterator.next();

                    if (rddRecord.topic().toString().equalsIgnoreCase("rawEventTopic")) {
                        ObjectMapper mapper = new ObjectMapper();
                        BaseDataModel csvDataModel = mapper.readValue(rddRecord.value(), BaseDataModel.class);
                        EnrichEventDataModel enrichEventDataModel = (EnrichEventDataModel) csvDataModel;
                        enrichEventKafkaProd.sendEnrichEvent(enrichEventDataModel);

                    } else if (rddRecord.topic().toString().equalsIgnoreCase("enrichEventTopic")) {
                        System.out.println("************getting enrichEventTopic data ************************");
                    }

                }

            });

            streamingContext.start();

            try {
                streamingContext.awaitTermination();
            } catch (InterruptedException e) { // TODO Auto-generated catch block
                e.printStackTrace();
            }
        });

    }

这是配置代码。

@Bean
public JavaInputDStream<ConsumerRecord<String, String>> getKafkaParam(JavaStreamingContext streamingContext) {
            Map<String, Object> kafkaParams = new HashedMap();
            kafkaParams.put("bootstrap.servers", "localhost:9092");
            kafkaParams.put("key.deserializer", StringDeserializer.class);
            kafkaParams.put("value.deserializer", StringDeserializer.class);
            kafkaParams.put("group.id", "group1");
            kafkaParams.put("auto.offset.reset", "latest");
            kafkaParams.put("enable.auto.commit", false);
            Collection<String> topics = Arrays.asList(rawEventTopic,enrichEventTopic);

            return KafkaUtils.createDirectStream(
                    streamingContext,
                    LocationStrategies.PreferConsistent(),
                    ConsumerStrategies.<String, String>Subscribe(topics, kafkaParams)
            );


        }

请帮助。我被困在这一点上。

1 个答案:

答案 0 :(得分:0)

在下面的链接中找到了我的问题的解决方案-

org.apache.spark.SparkException: Task not serializable

将内部类声明为静态变量:

static Function<Tuple2<String, String>, String> mapFunc=new Function<Tuple2<String, String>, String>() {
    @Override
    public String call(Tuple2<String, String> tuple2) {
        return tuple2._2();
    }
}