Spark可以从凤凰读取,但是写入给jdbc找不到合适的驱动程序:phoenix错误

时间:2017-07-27 10:26:25

标签: apache-spark jdbc hbase

最近,我的凤凰与Spark的整合一直在表现。它在上周工作得很好,事实上,仍然有一些代码使用类似的代码在集群上运行。只有我编写的新代码似乎无法正常工作,并且在向Phoenix编写数据框时会出现奇怪的错误 - 阅读工作正常:

def main(args: Array[String]): Unit =
    {
        if (args.length < 1) {
                    println("Needs a month range as: startingmonth,endingmonth")
                    System.exit(1)
            }

        val month_range = args(0).split(",")
        val rootLogger = Logger.getRootLogger()
        rootLogger.setLevel(Level.ERROR)

        val start = System.currentTimeMillis();

            val sparkConf = new SparkConf().setAppName("userProfile")//.set("spark.testing.memory", "536870912")


    val sc = new SparkContext(sparkConf)

    val sqlContext = new SQLContext(sc)

    import sqlContext.implicits._

    /*val itemProfileDF = sqlContext.read.format("jdbc")
            .options(ImmutableMap.of("driver", "org.apache.phoenix.jdbc.PhoenixDriver", "url",
                    "jdbc:phoenix:<ZK URL>:5181", "dbtable", "ITEMPROFILES")).load()
                    itemProfileDF.show() //[[ [ TAKE NOTE OF THIS COMMENTED PART] ]]*/

        val callUsageSummary = sqlContext.read.parquet("/edw_data_vol/hp_tab/CALL_USAGE_SUMMARY21_FCT")

     callUsageSummary.write.format("org.apache.phoenix.spark").mode(SaveMode.Overwrite).options(ImmutableMap.of("driver", "org.apache.phoenix.jdbc.PhoenixDriver","zkUrl","jdbc:phoenix:<ZK URL>:5181","table","AGGREGATIONFINAL")).save()
 print("Done")
  var stop = System.currentTimeMillis();
        System.out.println("Time taken to process the files" + (stop - start) / 1000 + "s")
    }

此代码抛出此错误:

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 70 in stage 2.0 failed 4 times, most recent failure: Lost task 70.3 in stage 2.0 (TID 13): java.sql.SQLException: No suitable driver found for jdbc:phoenix:<ZK URL>:5181:/hbase;
at java.sql.DriverManager.getConnection(DriverManager.java:596)
at java.sql.DriverManager.getConnection(DriverManager.java:187)
at org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:98)
at org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(ConnectionUtil.java:82)
at org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(ConnectionUtil.java:70)
at org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil.getUpsertColumnMetadataList(PhoenixConfigurationUtil.java:230)
at org.apache.phoenix.spark.DataFrameFunctions$$anonfun$2.apply(DataFrameFunctions.scala:45)
at org.apache.phoenix.spark.DataFrameFunctions$$anonfun$2.apply(DataFrameFunctions.scala:41)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

然而如果我取消注释标记的部分,你可以注意到,代码工作得很好,但只有在这个特定的类中,在其他类中,它可能会或可能不起作用时几乎不可靠。大多数情况下,我无法弄清楚改变它的工作方式。它很可能不是凤凰城的一个问题,因为我在用maven重新包装之后才发现结果发生了变化。此外,我的旧项目运行得很好。

还有一些类代码工作正常,但现在它没有重新打包它们(我认为)。

在命令行中我传递:

--conf "spark.driver.extraClassPath=/opt/mapr/spark/spark-1.6.1/lib/phoenix-spark-4.8.1-HBase-1.1.jar,/opt/mapr/spark/spark-1.6.1/lib/hbase-protocol-1.1.1-mapr-1602.jar"

我知道这些都不是凤凰官方文档中提到的客户端jar。但是使用这些代码来处理这些问题,有时甚至还会这样做。我已经尝试过客户端罐子以及这3个罐子的所有可能组合,用于执行器和驱动程序。我终于放弃并写在这里以防有人知道可能发生了什么。

0 个答案:

没有答案