结构化流与Kafka的依赖关系是什么?

时间:2018-06-08 21:34:00

标签: apache-spark apache-kafka spark-structured-streaming

我有一个用于结构化流媒体的Kafka和Spark应用程序。特别是我的KafkaProducer具有以下配置:

props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaBrokerEndpoint);
props.put(ProducerConfig.CLIENT_ID_CONFIG, "KafkaProducer");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
KafkaProducer<String,String> kafkaProducer= new KafkaProducer<String, String>(props);

然后我按如下方式创建一个ProducerRecord:

ProducerRecord<String, String> record= new ProducerRecord<String, String>(topic, json.toString());
kafkaProducer.send(record);

其中json.toString()表示具有JSON格式的String,即我想在Spark中处理的值。 现在我基本上做的是将Spark与Kafka主题联系起来,如官方的Spark Structured Streaming指南所述:

Dataset<Row> df =  sparkSession
      .readStream()
      .format("kafka")
      .option("kafka.bootstrap.servers", "localhost:9092")
      .option("subscribe", "kafkaToSparkTopic")
      .load();

然后

Dataset<Row> query = df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)");
query.writeStream().format("console").start();

我有以下输出和异常:

        === Result of Batch Resolution ===
    !'DeserializeToObject unresolveddeserializer(createexternalrow(getcolumnbyordinal(0, BinaryType), getcolumnbyordinal(1, BinaryType), getcolumnbyordinal(2, StringType).toString, getcolumnbyordinal(3, IntegerType), getcolumnbyordinal(4, LongType), staticinvoke(class org.apache.spark.sql.catalyst.util.DateTimeUtils$, ObjectType(class java.sql.Timestamp), toJavaTimestamp, getcolumnbyordinal(5, TimestampType), true), getcolumnbyordinal(6, IntegerType), StructField(key,BinaryType,true), StructField(value,BinaryType,true), StructField(topic,StringType,true), StructField(partition,IntegerType,true), StructField(offset,LongType,true), StructField(timestamp,TimestampType,true), StructField(timestampType,IntegerType,true))), obj#14: org.apache.spark.sql.Row   DeserializeToObject createexternalrow(key#0, value#1, topic#2.toString, partition#3, offset#4L, staticinvoke(class org.apache.spark.sql.catalyst.util.DateTimeUtils$, ObjectType(class java.sql.Timestamp), toJavaTimestamp, timestamp#5, true), timestampType#6, StructField(key,BinaryType,true), StructField(value,BinaryType,true), StructField(topic,StringType,true), StructField(partition,IntegerType,true), StructField(offset,LongType,true), StructField(timestamp,TimestampType,true), StructField(timestampType,IntegerType,true)), obj#14: org.apache.spark.sql.Row
     +- LocalRelation <empty>, [key#0, value#1, topic#2, partition#3, offset#4L, timestamp#5, timestampType#6]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             +- LocalRelation <empty>, [key#0, value#1, topic#2, partition#3, offset#4L, timestamp#5, timestampType#6]

    23:16:43.465 [main] INFO org.apache.spark.sql.execution.SparkSqlParser - Parsing command: CAST(key AS STRING)
    23:16:44.298 [main] INFO org.apache.spark.sql.execution.SparkSqlParser - Parsing command: CAST(value AS STRING)
    23:16:44.398 [main] DEBUG org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences - Resolving 'key to key#0
    23:16:44.401 [main] DEBUG org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences - Resolving 'value to value#1
    23:16:44.496 [main] DEBUG org.apache.spark.sql.internal.BaseSessionStateBuilder$$anon$1 - 
    === Result of Batch Resolution ===
    !'Project [unresolvedalias(cast('key as string), None), unresolvedalias(cast('value as string), None)]                                                                                                                                                                                  Project [cast(key#0 as string) AS key#15, cast(value#1 as string) AS value#16]
     +- StreamingRelation DataSource(org.apache.spark.sql.SparkSession@5a1f778,kafka,List(),None,List(),None,Map(subscribe -> kafkaToSparkTopic, kafka.bootstrap.servers -> localhost:9092),None), kafka, [key#0, value#1, topic#2, partition#3, offset#4L, timestamp#5, timestampType#6]   +- StreamingRelation DataSource(org.apache.spark.sql.SparkSession@5a1f778,kafka,List(),None,List(),None,Map(subscribe -> kafkaToSparkTopic, kafka.bootstrap.servers -> localhost:9092),None), kafka, [key#0, value#1, topic#2, partition#3, offset#4L, timestamp#5, timestampType#6]

    23:16:44.557 [main] DEBUG org.apache.spark.sql.internal.BaseSessionStateBuilder$$anon$1 - 
    === Result of Batch Resolution ===
    !'DeserializeToObject unresolveddeserializer(createexternalrow(getcolumnbyordinal(0, StringType).toString, getcolumnbyordinal(1, StringType).toString, StructField(key,StringType,true), StructField(value,StringType,true))), obj#19: org.apache.spark.sql.Row   DeserializeToObject createexternalrow(key#15.toString, value#16.toString, StructField(key,StringType,true), StructField(value,StringType,true)), obj#19: org.apache.spark.sql.Row
     +- LocalRelation <empty>, [key#15, value#16]                                                                                                                                                                                                                     +- LocalRelation <empty>, [key#15, value#16]

    23:16:44.796 [main] DEBUG org.apache.spark.sql.internal.BaseSessionStateBuilder$$anon$1 - 
    === Result of Batch Resolution ===
    !'DeserializeToObject unresolveddeserializer(createexternalrow(getcolumnbyordinal(0, StringType).toString, getcolumnbyordinal(1, StringType).toString, StructField(key,StringType,true), StructField(value,StringType,true))), obj#22: org.apache.spark.sql.Row   DeserializeToObject createexternalrow(key#15.toString, value#16.toString, StructField(key,StringType,true), StructField(value,StringType,true)), obj#22: org.apache.spark.sql.Row
     +- LocalRelation <empty>, [key#15, value#16]                                                                                                                                                                                                                     +- LocalRelation <empty>, [key#15, value#16]

    23:16:46.660 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - PrivilegedAction as:alberto (auth:SIMPLE) from:org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:331)
    23:16:46.782 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - PrivilegedAction as:alberto (auth:SIMPLE) from:org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:331)
    23:16:46.804 [main] INFO org.apache.spark.sql.execution.streaming.StreamExecution - Starting [id = 1a32e91e-4a23-4089-9343-d7940834b98d, runId = 5313abfb-6748-4f51-9c4e-f384db1e9346]. Use /tmp/temporary-4d94a508-a944-4447-9db9-413a210d7212 to store the query checkpoint.
    23:16:47.191 [Thread-2] INFO org.apache.spark.SparkContext - Invoking stop() from shutdown hook
    23:16:47.256 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - stopping org.spark_project.jetty.server.Server@60fa3495
    23:16:47.257 [Thread-2] DEBUG org.spark_project.jetty.server.Server - doStop org.spark_project.jetty.server.Server@60fa3495
    23:16:47.300 [SparkUI-28] DEBUG org.spark_project.jetty.util.thread.QueuedThreadPool - ran SparkUI-28-acceptor-0@460f76a6-ServerConnector@71104a4{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
    23:16:47.316 [stream execution thread for [id = 1a32e91e-4a23-4089-9343-d7940834b98d, runId = 5313abfb-6748-4f51-9c4e-f384db1e9346]] ERROR org.apache.spark.sql.execution.streaming.StreamExecution - Query [id = 1a32e91e-4a23-4089-9343-d7940834b98d, runId = 5313abfb-6748-4f51-9c4e-f384db1e9346] terminated with error
    java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
        at org.apache.spark.sql.kafka010.KafkaSourceProvider$$anonfun$3.apply(KafkaSourceProvider.scala:82)
        at org.apache.spark.sql.kafka010.KafkaSourceProvider$$anonfun$3.apply(KafkaSourceProvider.scala:82)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.immutable.Set$Set1.foreach(Set.scala:74)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
        at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47)
        at scala.collection.SetLike$class.map(SetLike.scala:93)
        at scala.collection.AbstractSet.map(Set.scala:47)
        at org.apache.spark.sql.kafka010.KafkaSourceProvider.createSource(KafkaSourceProvider.scala:82)
        at org.apache.spark.sql.execution.datasources.DataSource.createSource(DataSource.scala:243)
        at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2$$anonfun$applyOrElse$1.apply(StreamExecution.scala:158)
        at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2$$anonfun$applyOrElse$1.apply(StreamExecution.scala:155)
        at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189)
        at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91)
        at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2.applyOrElse(StreamExecution.scala:155)
        at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2.applyOrElse(StreamExecution.scala:153)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
        at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
        at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
        at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256)
        at org.apache.spark.sql.execution.streaming.StreamExecution.logicalPlan$lzycompute(StreamExecution.scala:153)
        at org.apache.spark.sql.execution.streaming.StreamExecution.logicalPlan(StreamExecution.scala:147)
        at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:276)
        at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:206)
    23:16:47.317 [Thread-2] DEBUG org.spark_project.jetty.server.Server - Graceful shutdown org.spark_project.jetty.server.Server@60fa3495 by 
    23:16:47.325 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - stopping Spark@71104a4{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
    23:16:47.325 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - stopping org.spark_project.jetty.server.ServerConnector$ServerConnectorManager@6e9319f
    23:16:47.326 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - stopping org.spark_project.jetty.io.ManagedSelector@21c64522 id=0 keys=0 selected=0
    23:16:47.326 [Thread-2] DEBUG org.spark_project.jetty.io.ManagedSelector - Stopping org.spark_project.jetty.io.ManagedSelector@21c64522 id=0 keys=0 selected=0
    23:16:47.355 [Thread-2] DEBUG org.spark_project.jetty.io.ManagedSelector - Queued change org.spark_project.jetty.io.ManagedSelector$CloseEndPoints@3a133be0 on org.spark_project.jetty.io.ManagedSelector@21c64522 id=0 keys=0 selected=0
    23:16:47.356 [SparkUI-27] DEBUG org.spark_project.jetty.io.ManagedSelector - Selector loop woken up from select, 0/0 selected
    23:16:47.357 [SparkUI-27] DEBUG org.spark_project.jetty.io.ManagedSelector - Running change org.spark_project.jetty.io.ManagedSelector$CloseEndPoints@3a133be0
    23:16:47.357 [SparkUI-27] DEBUG org.spark_project.jetty.io.ManagedSelector - Closing 0 endPoints on org.spark_project.jetty.io.ManagedSelector@21c64522 id=0 keys=0 selected=0
    23:16:47.357 [SparkUI-27] DEBUG org.spark_project.jetty.io.ManagedSelector - Closed 0 endPoints on org.spark_project.jetty.io.ManagedSelector@21c64522 id=0 keys=0 selected=0
    23:16:47.357 [SparkUI-27] DEBUG org.spark_project.jetty.io.ManagedSelector - Selector loop waiting on select
    23:16:47.358 [Thread-2] DEBUG org.spark_project.jetty.io.ManagedSelector - Queued change org.spark_project.jetty.io.ManagedSelector$CloseSelector@33ed88dc on org.spark_project.jetty.io.ManagedSelector@21c64522 id=0 keys=0 selected=0
    23:16:47.358 [SparkUI-27] DEBUG org.spark_project.jetty.io.ManagedSelector - Selector loop woken up from select, 0/0 selected
    23:16:47.358 [SparkUI-27] DEBUG org.spark_project.jetty.io.ManagedSelector - Running change org.spark_project.jetty.io.ManagedSelector$CloseSelector@33ed88dc
    23:16:47.359 [SparkUI-27] DEBUG org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume - EPC Prod/org.spark_project.jetty.io.ManagedSelector$SelectorProducer@caf8d6 produced null
    23:16:47.359 [SparkUI-27] DEBUG org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume - EPC Idle/org.spark_project.jetty.io.ManagedSelector$SelectorProducer@caf8d6 produce exit
    23:16:47.359 [SparkUI-27] DEBUG org.spark_project.jetty.util.thread.QueuedThreadPool - ran org.spark_project.jetty.io.ManagedSelector@21c64522 id=0 keys=-1 selected=-1
    23:16:47.359 [Thread-2] DEBUG org.spark_project.jetty.io.ManagedSelector - Stopped org.spark_project.jetty.io.ManagedSelector@21c64522 id=0 keys=-1 selected=-1
    23:16:47.359 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - STOPPED org.spark_project.jetty.io.ManagedSelector@21c64522 id=0 keys=-1 selected=-1
    23:16:47.359 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - STOPPED org.spark_project.jetty.server.ServerConnector$ServerConnectorManager@6e9319f
    23:16:47.359 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - stopping HttpConnectionFactory@5d25e6bb[HTTP/1.1]
    23:16:47.359 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - STOPPED HttpConnectionFactory@5d25e6bb[HTTP/1.1]
    23:16:47.359 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - stopping org.spark_project.jetty.util.thread.ScheduledExecutorScheduler@4985cbcb
    23:16:47.359 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - STOPPED org.spark_project.jetty.util.thread.ScheduledExecutorScheduler@4985cbcb
    23:16:47.359 [Thread-2] INFO org.spark_project.jetty.server.AbstractConnector - Stopped Spark@71104a4{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
    23:16:47.359 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - STOPPED Spark@71104a4{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
    23:16:47.360 [Thread-2] DEBUG org.spark_project.jetty.server.handler.AbstractHandler - stopping org.spark_project.jetty.server.Server@60fa3495
    23:16:47.360 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - stopping org.spark_project.jetty.server.handler.ContextHandlerCollection@89ff02e[org.spark_project.jetty.server.handler.gzip.GzipHandler@21526f6c, org.spark_project.jetty.server.handler.gzip.GzipHandler@2c715e84,org.spark_project.jetty.server.handler.gzip.GzipHandler@29876704, org.spark_project.jetty.server.handler.gzip.GzipHandler@379ab47b,  org.spark_project.jetty.server.handler.gzip.GzipHandler@3b366632, o.s.j.s.ServletContextHandler@63998bf4{/metrics/json,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@736ac09a{/SQL,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@3b0ca5e1{/SQL/json,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@5f78de22{/SQL/execution,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@272a179c{/SQL/execution/json,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@56781d96{/static/sql,null,SHUTDOWN,@Spark}]
    23:16:47.360 [Thread-2] DEBUG org.spark_project.jetty.server.handler.AbstractHandler - stopping org.spark_project.jetty.server.handler.ContextHandlerCollection@89ff02e[org.spark_project.jetty.server.handler.gzip.GzipHandler@21526f6c, org.spark_project.jetty.server.handler.gzip.GzipHandler@2c715e84, org.spark_project.jetty.server.handler.gzip.GzipHandler@70fab835, org.spark_project.jetty.server.handler.gzip.GzipHandler@64712be, org.spark_project.jetty.server.handler.gzip.GzipHandler@5ae81e1, org.spark_project.jetty.server.handler.gzip.GzipHandler@54709809, org.spark_project.jetty.server.handler.gzip.GzipHandler@48c40605, org.spark_project.jetty.server.handler.gzip.GzipHandler@21ec5d87, org.spark_project.jetty.server.handler.gzip.GzipHandler@4b21844c, org.spark_project.jetty.server.handler.gzip.GzipHandler@29876704,
org.spark_project.jetty.server.handler.gzip.GzipHandler@67427b69, org.spark_project.jetty.server.handler.gzip.GzipHandler@56102e1c, org.spark_project.jetty.server.handler.gzip.GzipHandler@3b366632, o.s.j.s.ServletContextHandler@63998bf4{/metrics/json,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@736ac09a{/SQL,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@3b0ca5e1{/SQL/json,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@5f78de22{/SQL/execution,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@272a179c{/SQL/execution/json,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@56781d96{/static/sql,null,SHUTDOWN,@Spark}]
    23:16:47.360 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - STOPPED org.spark_project.jetty.server.handler.ContextHandlerCollection@89ff02e[org.spark_project.jetty.server.handler.gzip.GzipHandler@21526f6c, org.spark_project.jetty.server.handler.gzip.GzipHandler@2c715e84, org.spark_project.jetty.server.handler.gzip.GzipHandler@70fab835, org.spark_project.jetty.server.handler.gzip.GzipHandler@64712be, org.spark_project.jetty.server.handler.gzip.GzipHandler@5ae81e1, org.spark_project.jetty.server.handler.gzip.GzipHandler@54709809, org.spark_project.jetty.server.handler.gzip.GzipHandler@48c40605, org.spark_project.jetty.server.handler.gzip.GzipHandler@21ec5d87, org.spark_project.jetty.server.handler.gzip.GzipHandler@4b21844c, org.spark_project.jetty.server.handler.gzip.GzipHandler@29876704, org.spark_project.jetty.server.handler.gzip.GzipHandler@379ab47b, org.spark_project.jetty.server.handler.gzip.GzipHandler@7cc586a8, org.spark_project.jetty.server.handler.gzip.GzipHandler@2f4854d6, org.spark_project.jetty.server.handler.gzip.GzipHandler@388ffbc2, org.spark_project.jetty.server.handler.gzip.GzipHandler@21d5c1a0, org.spark_project.jetty.server.handler.gzip.GzipHandler@3ec11999, org.spark_project.jetty.server.handler.gzip.GzipHandler@67ef029, org.spark_project.jetty.server.handler.gzip.GzipHandler@560cbf1a, org.spark_project.jetty.server.handler.gzip.GzipHandler@7a11c4c7, org.spark_project.jetty.server.handler.gzip.GzipHandler@b5cc23a, org.spark_project.jetty.server.handler.gzip.GzipHandler@660e9100, org.spark_project.jetty.server.handler.gzip.GzipHandler@16fb356, org.spark_project.jetty.server.handler.gzip.GzipHandler@67427b69, org.spark_project.jetty.server.handler.gzip.GzipHandler@56102e1c, org.spark_project.jetty.server.handler.gzip.GzipHandler@3b366632, o.s.j.s.ServletContextHandler@63998bf4{/metrics/json,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@736ac09a{/SQL,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@3b0ca5e1{/SQL/json,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@5f78de22{/SQL/execution,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@272a179c{/SQL/execution/json,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@56781d96{/static/sql,null,SHUTDOWN,@Spark}]
    23:16:47.360 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - stopping org.spark_project.jetty.server.handler.ErrorHandler@29a60c27
    23:16:47.360 [Thread-2] DEBUG org.spark_project.jetty.server.handler.AbstractHandler - stopping org.spark_project.jetty.server.handler.ErrorHandler@29a60c27
    23:16:47.360 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - STOPPED org.spark_project.jetty.server.handler.ErrorHandler@29a60c27
    23:16:47.360 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - stopping SparkUI{STARTED,8<=8<=200,i=8,q=0}
    23:16:47.430 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - STOPPED SparkUI{STOPPED,8<=8<=200,i=0,q=0}
    23:16:47.443 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - STOPPED org.spark_project.jetty.server.Server@60fa3495
    Exception in thread "stream execution thread for [id = 1a32e91e-4a23-4089-9343-d7940834b98d, runId = 5313abfb-6748-4f51-9c4e-f384db1e9346]" java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
        at org.apache.spark.sql.kafka010.KafkaSourceProvider$$anonfun$3.apply(KafkaSourceProvider.scala:82)
        at org.apache.spark.sql.kafka010.KafkaSourceProvider$$anonfun$3.apply(KafkaSourceProvider.scala:82)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.immutable.Set$Set1.foreach(Set.scala:74)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
        at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47)
        at scala.collection.SetLike$class.map(SetLike.scala:93)
        at scala.collection.AbstractSet.map(Set.scala:47)
        at org.apache.spark.sql.kafka010.KafkaSourceProvider.createSource(KafkaSourceProvider.scala:82)
        at org.apache.spark.sql.execution.datasources.DataSource.createSource(DataSource.scala:243)
        at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2$$anonfun$applyOrElse$1.apply(StreamExecution.scala:158)
        at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2$$anonfun$applyOrElse$1.apply(StreamExecution.scala:155)
        at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189)
        at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91)
        at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2.applyOrElse(StreamExecution.scala:155)
        at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2.applyOrElse(StreamExecution.scala:153)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
        at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
        at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
        at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256)
        at org.apache.spark.sql.execution.streaming.StreamExecution.logicalPlan$lzycompute(StreamExecution.scala:153)
        at org.apache.spark.sql.execution.streaming.StreamExecution.logicalPlan(StreamExecution.scala:147)
        at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:276)
        at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:206)
    23:16:47.515 [Thread-2] INFO org.apache.spark.ui.SparkUI - Stopped Spark web UI at http://192.168.10.1:4040
    23:16:47.706 [dispatcher-event-loop-1] INFO org.apache.spark.MapOutputTrackerMasterEndpoint - MapOutputTrackerMasterEndpoint stopped!

首先:批量解析的===结果===和DeserializeToObject未解析的序列化是否正确?如何提取值字段,其中String表示我感兴趣的JSON?

.pom文件是

 <dependencies>
<dependency>
    <groupId>org.springframework.kafka</groupId>
    <artifactId>spring-kafka</artifactId>
    <version>2.1.6.RELEASE</version>
</dependency>

<!-- https://mvnrepository.com/artifact/org.springframework.boot/spring-boot-starter-web -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
    <version>2.0.2.RELEASE</version>
</dependency>

<dependency>
    <groupId>com.satori</groupId>
    <artifactId>satori-rtm-sdk</artifactId>
    <version>1.0.3</version>
    </dependency>

    <dependency>
    <groupId>com.satori</groupId>
    <artifactId>satori-rtm-sdk-core</artifactId>
    <version>1.0.3</version>
</dependency>

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming-kafka_2.10</artifactId>
    <version>0.9.0-incubating</version>
</dependency>

<!-- This is for KafkaUtils.createDirectStream-->
<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-streaming-kafka_2.10</artifactId>
  <version>1.3.0</version>
</dependency>

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming-kafka-0-10_2.10</artifactId>
    <version>2.0.0</version>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming-kafka-0-10_2.10</artifactId>
    <version>2.0.0</version>
</dependency>

<dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.10</artifactId>
      <version>2.2.0</version>
    </dependency>

        <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming_2.10</artifactId>
      <version>2.2.0</version>
    </dependency>

        <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.10</artifactId>
      <version>2.2.0</version>
    </dependency>

       <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql-kafka-0-10_2.11</artifactId>
        <version>2.2.0</version>
    </dependency>

  </dependencies>

来自spark-shell我有Spark的2.3.0版本和Scala的2.11.8版本(Java 1.8.0_171),而我目前正在使用文件kafka_2.11-1.1.0(所以卡夫卡1.1.0)。

1 个答案:

答案 0 :(得分:1)

您遇到了jar不兼容的问题,因为您不仅使用了多个不兼容的Spark版本,而且还使用了不兼容的Scala版本。

您的pom.xml基本上应具有以下依赖性:

<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-sql_2.11</artifactId>
  <version>2.3.1</version>
</dependency>
<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-sql-kafka-0-10_2.11</artifactId>
  <version>2.3.1</version>
</dependency>

_2.11artifactId中分别为Scala版本和2.3.1记住spark-sql_2.11

spark-sql-kafka-0-10_2.11提供了Spark结构化流,而String[]提供了Apache Kafka的Spark连接器。