最近,我的凤凰与Spark的整合一直在表现。它在上周工作得很好,事实上,仍然有一些代码使用类似的代码在集群上运行。只有我编写的新代码似乎无法正常工作,并且在向Phoenix编写数据框时会出现奇怪的错误 - 阅读工作正常:
def main(args: Array[String]): Unit =
{
if (args.length < 1) {
println("Needs a month range as: startingmonth,endingmonth")
System.exit(1)
}
val month_range = args(0).split(",")
val rootLogger = Logger.getRootLogger()
rootLogger.setLevel(Level.ERROR)
val start = System.currentTimeMillis();
val sparkConf = new SparkConf().setAppName("userProfile")//.set("spark.testing.memory", "536870912")
val sc = new SparkContext(sparkConf)
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
/*val itemProfileDF = sqlContext.read.format("jdbc")
.options(ImmutableMap.of("driver", "org.apache.phoenix.jdbc.PhoenixDriver", "url",
"jdbc:phoenix:<ZK URL>:5181", "dbtable", "ITEMPROFILES")).load()
itemProfileDF.show() //[[ [ TAKE NOTE OF THIS COMMENTED PART] ]]*/
val callUsageSummary = sqlContext.read.parquet("/edw_data_vol/hp_tab/CALL_USAGE_SUMMARY21_FCT")
callUsageSummary.write.format("org.apache.phoenix.spark").mode(SaveMode.Overwrite).options(ImmutableMap.of("driver", "org.apache.phoenix.jdbc.PhoenixDriver","zkUrl","jdbc:phoenix:<ZK URL>:5181","table","AGGREGATIONFINAL")).save()
print("Done")
var stop = System.currentTimeMillis();
System.out.println("Time taken to process the files" + (stop - start) / 1000 + "s")
}
此代码抛出此错误:
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 70 in stage 2.0 failed 4 times, most recent failure: Lost task 70.3 in stage 2.0 (TID 13): java.sql.SQLException: No suitable driver found for jdbc:phoenix:<ZK URL>:5181:/hbase;
at java.sql.DriverManager.getConnection(DriverManager.java:596)
at java.sql.DriverManager.getConnection(DriverManager.java:187)
at org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:98)
at org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(ConnectionUtil.java:82)
at org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(ConnectionUtil.java:70)
at org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil.getUpsertColumnMetadataList(PhoenixConfigurationUtil.java:230)
at org.apache.phoenix.spark.DataFrameFunctions$$anonfun$2.apply(DataFrameFunctions.scala:45)
at org.apache.phoenix.spark.DataFrameFunctions$$anonfun$2.apply(DataFrameFunctions.scala:41)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
然而如果我取消注释标记的部分,你可以注意到,代码工作得很好,但只有在这个特定的类中,在其他类中,它可能会或可能不起作用时几乎不可靠。大多数情况下,我无法弄清楚改变它的工作方式。它很可能不是凤凰城的一个问题,因为我在用maven重新包装之后才发现结果发生了变化。此外,我的旧项目运行得很好。
还有一些类代码工作正常,但现在它没有重新打包它们(我认为)。
在命令行中我传递:
--conf "spark.driver.extraClassPath=/opt/mapr/spark/spark-1.6.1/lib/phoenix-spark-4.8.1-HBase-1.1.jar,/opt/mapr/spark/spark-1.6.1/lib/hbase-protocol-1.1.1-mapr-1602.jar"
我知道这些都不是凤凰官方文档中提到的客户端jar。但是使用这些代码来处理这些问题,有时甚至还会这样做。我已经尝试过客户端罐子以及这3个罐子的所有可能组合,用于执行器和驱动程序。我终于放弃并写在这里以防有人知道可能发生了什么。