我有一个用于结构化流媒体的Kafka和Spark应用程序。特别是我的KafkaProducer具有以下配置:
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaBrokerEndpoint);
props.put(ProducerConfig.CLIENT_ID_CONFIG, "KafkaProducer");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
KafkaProducer<String,String> kafkaProducer= new KafkaProducer<String, String>(props);
然后我按如下方式创建一个ProducerRecord:
ProducerRecord<String, String> record= new ProducerRecord<String, String>(topic, json.toString());
kafkaProducer.send(record);
其中json.toString()
表示具有JSON格式的String,即我想在Spark中处理的值。
现在我基本上做的是将Spark与Kafka主题联系起来,如官方的Spark Structured Streaming指南所述:
Dataset<Row> df = sparkSession
.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", "localhost:9092")
.option("subscribe", "kafkaToSparkTopic")
.load();
然后
Dataset<Row> query = df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)");
query.writeStream().format("console").start();
我有以下输出和异常:
=== Result of Batch Resolution ===
!'DeserializeToObject unresolveddeserializer(createexternalrow(getcolumnbyordinal(0, BinaryType), getcolumnbyordinal(1, BinaryType), getcolumnbyordinal(2, StringType).toString, getcolumnbyordinal(3, IntegerType), getcolumnbyordinal(4, LongType), staticinvoke(class org.apache.spark.sql.catalyst.util.DateTimeUtils$, ObjectType(class java.sql.Timestamp), toJavaTimestamp, getcolumnbyordinal(5, TimestampType), true), getcolumnbyordinal(6, IntegerType), StructField(key,BinaryType,true), StructField(value,BinaryType,true), StructField(topic,StringType,true), StructField(partition,IntegerType,true), StructField(offset,LongType,true), StructField(timestamp,TimestampType,true), StructField(timestampType,IntegerType,true))), obj#14: org.apache.spark.sql.Row DeserializeToObject createexternalrow(key#0, value#1, topic#2.toString, partition#3, offset#4L, staticinvoke(class org.apache.spark.sql.catalyst.util.DateTimeUtils$, ObjectType(class java.sql.Timestamp), toJavaTimestamp, timestamp#5, true), timestampType#6, StructField(key,BinaryType,true), StructField(value,BinaryType,true), StructField(topic,StringType,true), StructField(partition,IntegerType,true), StructField(offset,LongType,true), StructField(timestamp,TimestampType,true), StructField(timestampType,IntegerType,true)), obj#14: org.apache.spark.sql.Row
+- LocalRelation <empty>, [key#0, value#1, topic#2, partition#3, offset#4L, timestamp#5, timestampType#6] +- LocalRelation <empty>, [key#0, value#1, topic#2, partition#3, offset#4L, timestamp#5, timestampType#6]
23:16:43.465 [main] INFO org.apache.spark.sql.execution.SparkSqlParser - Parsing command: CAST(key AS STRING)
23:16:44.298 [main] INFO org.apache.spark.sql.execution.SparkSqlParser - Parsing command: CAST(value AS STRING)
23:16:44.398 [main] DEBUG org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences - Resolving 'key to key#0
23:16:44.401 [main] DEBUG org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences - Resolving 'value to value#1
23:16:44.496 [main] DEBUG org.apache.spark.sql.internal.BaseSessionStateBuilder$$anon$1 -
=== Result of Batch Resolution ===
!'Project [unresolvedalias(cast('key as string), None), unresolvedalias(cast('value as string), None)] Project [cast(key#0 as string) AS key#15, cast(value#1 as string) AS value#16]
+- StreamingRelation DataSource(org.apache.spark.sql.SparkSession@5a1f778,kafka,List(),None,List(),None,Map(subscribe -> kafkaToSparkTopic, kafka.bootstrap.servers -> localhost:9092),None), kafka, [key#0, value#1, topic#2, partition#3, offset#4L, timestamp#5, timestampType#6] +- StreamingRelation DataSource(org.apache.spark.sql.SparkSession@5a1f778,kafka,List(),None,List(),None,Map(subscribe -> kafkaToSparkTopic, kafka.bootstrap.servers -> localhost:9092),None), kafka, [key#0, value#1, topic#2, partition#3, offset#4L, timestamp#5, timestampType#6]
23:16:44.557 [main] DEBUG org.apache.spark.sql.internal.BaseSessionStateBuilder$$anon$1 -
=== Result of Batch Resolution ===
!'DeserializeToObject unresolveddeserializer(createexternalrow(getcolumnbyordinal(0, StringType).toString, getcolumnbyordinal(1, StringType).toString, StructField(key,StringType,true), StructField(value,StringType,true))), obj#19: org.apache.spark.sql.Row DeserializeToObject createexternalrow(key#15.toString, value#16.toString, StructField(key,StringType,true), StructField(value,StringType,true)), obj#19: org.apache.spark.sql.Row
+- LocalRelation <empty>, [key#15, value#16] +- LocalRelation <empty>, [key#15, value#16]
23:16:44.796 [main] DEBUG org.apache.spark.sql.internal.BaseSessionStateBuilder$$anon$1 -
=== Result of Batch Resolution ===
!'DeserializeToObject unresolveddeserializer(createexternalrow(getcolumnbyordinal(0, StringType).toString, getcolumnbyordinal(1, StringType).toString, StructField(key,StringType,true), StructField(value,StringType,true))), obj#22: org.apache.spark.sql.Row DeserializeToObject createexternalrow(key#15.toString, value#16.toString, StructField(key,StringType,true), StructField(value,StringType,true)), obj#22: org.apache.spark.sql.Row
+- LocalRelation <empty>, [key#15, value#16] +- LocalRelation <empty>, [key#15, value#16]
23:16:46.660 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - PrivilegedAction as:alberto (auth:SIMPLE) from:org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:331)
23:16:46.782 [main] DEBUG org.apache.hadoop.security.UserGroupInformation - PrivilegedAction as:alberto (auth:SIMPLE) from:org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:331)
23:16:46.804 [main] INFO org.apache.spark.sql.execution.streaming.StreamExecution - Starting [id = 1a32e91e-4a23-4089-9343-d7940834b98d, runId = 5313abfb-6748-4f51-9c4e-f384db1e9346]. Use /tmp/temporary-4d94a508-a944-4447-9db9-413a210d7212 to store the query checkpoint.
23:16:47.191 [Thread-2] INFO org.apache.spark.SparkContext - Invoking stop() from shutdown hook
23:16:47.256 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - stopping org.spark_project.jetty.server.Server@60fa3495
23:16:47.257 [Thread-2] DEBUG org.spark_project.jetty.server.Server - doStop org.spark_project.jetty.server.Server@60fa3495
23:16:47.300 [SparkUI-28] DEBUG org.spark_project.jetty.util.thread.QueuedThreadPool - ran SparkUI-28-acceptor-0@460f76a6-ServerConnector@71104a4{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
23:16:47.316 [stream execution thread for [id = 1a32e91e-4a23-4089-9343-d7940834b98d, runId = 5313abfb-6748-4f51-9c4e-f384db1e9346]] ERROR org.apache.spark.sql.execution.streaming.StreamExecution - Query [id = 1a32e91e-4a23-4089-9343-d7940834b98d, runId = 5313abfb-6748-4f51-9c4e-f384db1e9346] terminated with error
java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
at org.apache.spark.sql.kafka010.KafkaSourceProvider$$anonfun$3.apply(KafkaSourceProvider.scala:82)
at org.apache.spark.sql.kafka010.KafkaSourceProvider$$anonfun$3.apply(KafkaSourceProvider.scala:82)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.Set$Set1.foreach(Set.scala:74)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47)
at scala.collection.SetLike$class.map(SetLike.scala:93)
at scala.collection.AbstractSet.map(Set.scala:47)
at org.apache.spark.sql.kafka010.KafkaSourceProvider.createSource(KafkaSourceProvider.scala:82)
at org.apache.spark.sql.execution.datasources.DataSource.createSource(DataSource.scala:243)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2$$anonfun$applyOrElse$1.apply(StreamExecution.scala:158)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2$$anonfun$applyOrElse$1.apply(StreamExecution.scala:155)
at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189)
at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2.applyOrElse(StreamExecution.scala:155)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2.applyOrElse(StreamExecution.scala:153)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256)
at org.apache.spark.sql.execution.streaming.StreamExecution.logicalPlan$lzycompute(StreamExecution.scala:153)
at org.apache.spark.sql.execution.streaming.StreamExecution.logicalPlan(StreamExecution.scala:147)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:276)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:206)
23:16:47.317 [Thread-2] DEBUG org.spark_project.jetty.server.Server - Graceful shutdown org.spark_project.jetty.server.Server@60fa3495 by
23:16:47.325 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - stopping Spark@71104a4{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
23:16:47.325 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - stopping org.spark_project.jetty.server.ServerConnector$ServerConnectorManager@6e9319f
23:16:47.326 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - stopping org.spark_project.jetty.io.ManagedSelector@21c64522 id=0 keys=0 selected=0
23:16:47.326 [Thread-2] DEBUG org.spark_project.jetty.io.ManagedSelector - Stopping org.spark_project.jetty.io.ManagedSelector@21c64522 id=0 keys=0 selected=0
23:16:47.355 [Thread-2] DEBUG org.spark_project.jetty.io.ManagedSelector - Queued change org.spark_project.jetty.io.ManagedSelector$CloseEndPoints@3a133be0 on org.spark_project.jetty.io.ManagedSelector@21c64522 id=0 keys=0 selected=0
23:16:47.356 [SparkUI-27] DEBUG org.spark_project.jetty.io.ManagedSelector - Selector loop woken up from select, 0/0 selected
23:16:47.357 [SparkUI-27] DEBUG org.spark_project.jetty.io.ManagedSelector - Running change org.spark_project.jetty.io.ManagedSelector$CloseEndPoints@3a133be0
23:16:47.357 [SparkUI-27] DEBUG org.spark_project.jetty.io.ManagedSelector - Closing 0 endPoints on org.spark_project.jetty.io.ManagedSelector@21c64522 id=0 keys=0 selected=0
23:16:47.357 [SparkUI-27] DEBUG org.spark_project.jetty.io.ManagedSelector - Closed 0 endPoints on org.spark_project.jetty.io.ManagedSelector@21c64522 id=0 keys=0 selected=0
23:16:47.357 [SparkUI-27] DEBUG org.spark_project.jetty.io.ManagedSelector - Selector loop waiting on select
23:16:47.358 [Thread-2] DEBUG org.spark_project.jetty.io.ManagedSelector - Queued change org.spark_project.jetty.io.ManagedSelector$CloseSelector@33ed88dc on org.spark_project.jetty.io.ManagedSelector@21c64522 id=0 keys=0 selected=0
23:16:47.358 [SparkUI-27] DEBUG org.spark_project.jetty.io.ManagedSelector - Selector loop woken up from select, 0/0 selected
23:16:47.358 [SparkUI-27] DEBUG org.spark_project.jetty.io.ManagedSelector - Running change org.spark_project.jetty.io.ManagedSelector$CloseSelector@33ed88dc
23:16:47.359 [SparkUI-27] DEBUG org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume - EPC Prod/org.spark_project.jetty.io.ManagedSelector$SelectorProducer@caf8d6 produced null
23:16:47.359 [SparkUI-27] DEBUG org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume - EPC Idle/org.spark_project.jetty.io.ManagedSelector$SelectorProducer@caf8d6 produce exit
23:16:47.359 [SparkUI-27] DEBUG org.spark_project.jetty.util.thread.QueuedThreadPool - ran org.spark_project.jetty.io.ManagedSelector@21c64522 id=0 keys=-1 selected=-1
23:16:47.359 [Thread-2] DEBUG org.spark_project.jetty.io.ManagedSelector - Stopped org.spark_project.jetty.io.ManagedSelector@21c64522 id=0 keys=-1 selected=-1
23:16:47.359 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - STOPPED org.spark_project.jetty.io.ManagedSelector@21c64522 id=0 keys=-1 selected=-1
23:16:47.359 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - STOPPED org.spark_project.jetty.server.ServerConnector$ServerConnectorManager@6e9319f
23:16:47.359 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - stopping HttpConnectionFactory@5d25e6bb[HTTP/1.1]
23:16:47.359 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - STOPPED HttpConnectionFactory@5d25e6bb[HTTP/1.1]
23:16:47.359 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - stopping org.spark_project.jetty.util.thread.ScheduledExecutorScheduler@4985cbcb
23:16:47.359 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - STOPPED org.spark_project.jetty.util.thread.ScheduledExecutorScheduler@4985cbcb
23:16:47.359 [Thread-2] INFO org.spark_project.jetty.server.AbstractConnector - Stopped Spark@71104a4{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
23:16:47.359 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - STOPPED Spark@71104a4{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
23:16:47.360 [Thread-2] DEBUG org.spark_project.jetty.server.handler.AbstractHandler - stopping org.spark_project.jetty.server.Server@60fa3495
23:16:47.360 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - stopping org.spark_project.jetty.server.handler.ContextHandlerCollection@89ff02e[org.spark_project.jetty.server.handler.gzip.GzipHandler@21526f6c, org.spark_project.jetty.server.handler.gzip.GzipHandler@2c715e84,org.spark_project.jetty.server.handler.gzip.GzipHandler@29876704, org.spark_project.jetty.server.handler.gzip.GzipHandler@379ab47b, org.spark_project.jetty.server.handler.gzip.GzipHandler@3b366632, o.s.j.s.ServletContextHandler@63998bf4{/metrics/json,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@736ac09a{/SQL,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@3b0ca5e1{/SQL/json,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@5f78de22{/SQL/execution,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@272a179c{/SQL/execution/json,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@56781d96{/static/sql,null,SHUTDOWN,@Spark}]
23:16:47.360 [Thread-2] DEBUG org.spark_project.jetty.server.handler.AbstractHandler - stopping org.spark_project.jetty.server.handler.ContextHandlerCollection@89ff02e[org.spark_project.jetty.server.handler.gzip.GzipHandler@21526f6c, org.spark_project.jetty.server.handler.gzip.GzipHandler@2c715e84, org.spark_project.jetty.server.handler.gzip.GzipHandler@70fab835, org.spark_project.jetty.server.handler.gzip.GzipHandler@64712be, org.spark_project.jetty.server.handler.gzip.GzipHandler@5ae81e1, org.spark_project.jetty.server.handler.gzip.GzipHandler@54709809, org.spark_project.jetty.server.handler.gzip.GzipHandler@48c40605, org.spark_project.jetty.server.handler.gzip.GzipHandler@21ec5d87, org.spark_project.jetty.server.handler.gzip.GzipHandler@4b21844c, org.spark_project.jetty.server.handler.gzip.GzipHandler@29876704,
org.spark_project.jetty.server.handler.gzip.GzipHandler@67427b69, org.spark_project.jetty.server.handler.gzip.GzipHandler@56102e1c, org.spark_project.jetty.server.handler.gzip.GzipHandler@3b366632, o.s.j.s.ServletContextHandler@63998bf4{/metrics/json,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@736ac09a{/SQL,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@3b0ca5e1{/SQL/json,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@5f78de22{/SQL/execution,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@272a179c{/SQL/execution/json,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@56781d96{/static/sql,null,SHUTDOWN,@Spark}]
23:16:47.360 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - STOPPED org.spark_project.jetty.server.handler.ContextHandlerCollection@89ff02e[org.spark_project.jetty.server.handler.gzip.GzipHandler@21526f6c, org.spark_project.jetty.server.handler.gzip.GzipHandler@2c715e84, org.spark_project.jetty.server.handler.gzip.GzipHandler@70fab835, org.spark_project.jetty.server.handler.gzip.GzipHandler@64712be, org.spark_project.jetty.server.handler.gzip.GzipHandler@5ae81e1, org.spark_project.jetty.server.handler.gzip.GzipHandler@54709809, org.spark_project.jetty.server.handler.gzip.GzipHandler@48c40605, org.spark_project.jetty.server.handler.gzip.GzipHandler@21ec5d87, org.spark_project.jetty.server.handler.gzip.GzipHandler@4b21844c, org.spark_project.jetty.server.handler.gzip.GzipHandler@29876704, org.spark_project.jetty.server.handler.gzip.GzipHandler@379ab47b, org.spark_project.jetty.server.handler.gzip.GzipHandler@7cc586a8, org.spark_project.jetty.server.handler.gzip.GzipHandler@2f4854d6, org.spark_project.jetty.server.handler.gzip.GzipHandler@388ffbc2, org.spark_project.jetty.server.handler.gzip.GzipHandler@21d5c1a0, org.spark_project.jetty.server.handler.gzip.GzipHandler@3ec11999, org.spark_project.jetty.server.handler.gzip.GzipHandler@67ef029, org.spark_project.jetty.server.handler.gzip.GzipHandler@560cbf1a, org.spark_project.jetty.server.handler.gzip.GzipHandler@7a11c4c7, org.spark_project.jetty.server.handler.gzip.GzipHandler@b5cc23a, org.spark_project.jetty.server.handler.gzip.GzipHandler@660e9100, org.spark_project.jetty.server.handler.gzip.GzipHandler@16fb356, org.spark_project.jetty.server.handler.gzip.GzipHandler@67427b69, org.spark_project.jetty.server.handler.gzip.GzipHandler@56102e1c, org.spark_project.jetty.server.handler.gzip.GzipHandler@3b366632, o.s.j.s.ServletContextHandler@63998bf4{/metrics/json,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@736ac09a{/SQL,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@3b0ca5e1{/SQL/json,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@5f78de22{/SQL/execution,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@272a179c{/SQL/execution/json,null,SHUTDOWN,@Spark}, o.s.j.s.ServletContextHandler@56781d96{/static/sql,null,SHUTDOWN,@Spark}]
23:16:47.360 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - stopping org.spark_project.jetty.server.handler.ErrorHandler@29a60c27
23:16:47.360 [Thread-2] DEBUG org.spark_project.jetty.server.handler.AbstractHandler - stopping org.spark_project.jetty.server.handler.ErrorHandler@29a60c27
23:16:47.360 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - STOPPED org.spark_project.jetty.server.handler.ErrorHandler@29a60c27
23:16:47.360 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - stopping SparkUI{STARTED,8<=8<=200,i=8,q=0}
23:16:47.430 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - STOPPED SparkUI{STOPPED,8<=8<=200,i=0,q=0}
23:16:47.443 [Thread-2] DEBUG org.spark_project.jetty.util.component.AbstractLifeCycle - STOPPED org.spark_project.jetty.server.Server@60fa3495
Exception in thread "stream execution thread for [id = 1a32e91e-4a23-4089-9343-d7940834b98d, runId = 5313abfb-6748-4f51-9c4e-f384db1e9346]" java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
at org.apache.spark.sql.kafka010.KafkaSourceProvider$$anonfun$3.apply(KafkaSourceProvider.scala:82)
at org.apache.spark.sql.kafka010.KafkaSourceProvider$$anonfun$3.apply(KafkaSourceProvider.scala:82)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.Set$Set1.foreach(Set.scala:74)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47)
at scala.collection.SetLike$class.map(SetLike.scala:93)
at scala.collection.AbstractSet.map(Set.scala:47)
at org.apache.spark.sql.kafka010.KafkaSourceProvider.createSource(KafkaSourceProvider.scala:82)
at org.apache.spark.sql.execution.datasources.DataSource.createSource(DataSource.scala:243)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2$$anonfun$applyOrElse$1.apply(StreamExecution.scala:158)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2$$anonfun$applyOrElse$1.apply(StreamExecution.scala:155)
at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189)
at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2.applyOrElse(StreamExecution.scala:155)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$2.applyOrElse(StreamExecution.scala:153)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256)
at org.apache.spark.sql.execution.streaming.StreamExecution.logicalPlan$lzycompute(StreamExecution.scala:153)
at org.apache.spark.sql.execution.streaming.StreamExecution.logicalPlan(StreamExecution.scala:147)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:276)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:206)
23:16:47.515 [Thread-2] INFO org.apache.spark.ui.SparkUI - Stopped Spark web UI at http://192.168.10.1:4040
23:16:47.706 [dispatcher-event-loop-1] INFO org.apache.spark.MapOutputTrackerMasterEndpoint - MapOutputTrackerMasterEndpoint stopped!
首先:批量解析的===结果===和DeserializeToObject未解析的序列化是否正确?如何提取值字段,其中String表示我感兴趣的JSON?
.pom文件是
<dependencies>
<dependency>
<groupId>org.springframework.kafka</groupId>
<artifactId>spring-kafka</artifactId>
<version>2.1.6.RELEASE</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.springframework.boot/spring-boot-starter-web -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
<version>2.0.2.RELEASE</version>
</dependency>
<dependency>
<groupId>com.satori</groupId>
<artifactId>satori-rtm-sdk</artifactId>
<version>1.0.3</version>
</dependency>
<dependency>
<groupId>com.satori</groupId>
<artifactId>satori-rtm-sdk-core</artifactId>
<version>1.0.3</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.10</artifactId>
<version>0.9.0-incubating</version>
</dependency>
<!-- This is for KafkaUtils.createDirectStream-->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.10</artifactId>
<version>1.3.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_2.10</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_2.10</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-kafka-0-10_2.11</artifactId>
<version>2.2.0</version>
</dependency>
</dependencies>
来自spark-shell我有Spark的2.3.0版本和Scala的2.11.8版本(Java 1.8.0_171),而我目前正在使用文件kafka_2.11-1.1.0(所以卡夫卡1.1.0)。
答案 0 :(得分:1)
您遇到了jar不兼容的问题,因为您不仅使用了多个不兼容的Spark版本,而且还使用了不兼容的Scala版本。
您的pom.xml
基本上应具有以下依赖性:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-kafka-0-10_2.11</artifactId>
<version>2.3.1</version>
</dependency>
在_2.11
和artifactId
中分别为Scala版本和2.3.1
记住spark-sql_2.11
。
spark-sql-kafka-0-10_2.11
提供了Spark结构化流,而String[]
提供了Apache Kafka的Spark连接器。