火花凤凰连接

时间:2021-06-17 13:11:12

标签: apache-spark apache-spark-sql phoenix

Spark 版本 - 2.1.1.2.6.1.0-129 Scala 版本 - 2.11.8 Phoneix 版本 - 4.7

我正在尝试从 kafka 获取数据到 spark 并将其推送到 phoenix。 我开发了一个程序来完成这项工作,但它只能在本地模式下运行。当我尝试在集群模式下使用 master yarn 运行它时,程序会继续运行而不将数据推送到 phoenix。为了在纱线模式下检查 spark 和凤凰之间的基本连接,我尝试在纱线模式下使用 spark-shell。我已使用以下命令启动 spark-shell。

spark-shell --master yarn \
--conf "spark.driver.extraClassPath=phoenix-spark2.jar:phoenix-4.7.0.2.6.1.0-129-client.jar:/etc/hbase/conf" \
--conf "spark.executor.extraClassPath=phoenix-spark2.jar:phoenix-4.7.0.2.6.1.0-129-client.jar:/etc/hbase/conf" \
--jars \
spark-streaming-kafka-0-10_2.11-2.1.1.jar,\
kafka-clients-0.10.1.2.6.1.0-129.jar,\
phoenix-spark2.jar,\
phoenix-core-4.7.0.2.6.5.0-292.jar,\
phoenix-4.7.0.2.6.5.0-292-spark2.jar,\
phoenix-4.7.0.2.6.1.0-129-client.jar

我在数据框中获取了一条记录,并尝试使用以下语法将其推送到 phoenix

data.write.format("org.apache.phoenix.spark").mode(SaveMode.Overwrite).options(collection.immutable.Map(
                  "zkUrl" -> zkUrl,
                  "table" -> table)).save()

上面的代码在 master 是本地的并且数据被插入到 phoenix 时运行,但是当 master 是 yarn 时它会抛出错误。以下是错误日志开头几行的副本。

2021-06-17 16:13:24,267 WARN  [task-result-getter-0] scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, sandbox.hortonworks.com, executor 1): java.lang.ClassCastException: scala.collection.Iterator$$anon$11 cannot be cast to java.lang.String
at org.apache.phoenix.spark.DataFrameFunctions$$anonfun$1.apply(DataFrameFunctions.scala:32)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745

2021-06-17 16:13:26,093 ERROR [task-result-getter-3] scheduler.TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, sandbox.hortonworks.com, executor 2): java.lang.ClassCastException: scala.collection.Iterator$$anon$11 cannot be cast to java.lang.String
at org.apache.phoenix.spark.DataFrameFunctions$$anonfun$1.apply(DataFrameFunctions.scala:32)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)

0 个答案:

没有答案