我有一个spark(v 1.6.0)应用程序连接到Impala usig sqlContext来创建数据框
Dataframe dataframe = sqlContext.read().format("jdbc").options(ImpalaConnection.getDatabaseConnection(dbtable)).load();
getDatabase方法返回一个包含以下值的地图,ImpalaJDBC4(2.5.35)驱动程序
options.put("url", "jdbc:impala:xxxx:21050;AuthMech=1;KrbRealm=CLOUDERA;KrbHostFQDN=host;KrbServiceName=impala");
options.put("driver", "com.cloudera.impala.jdbc4.Driver");
options.put("dbtable", "db.table");
options.put("partitionColumn", "rowId");
options.put("numPartitions", "6");
options.put("upperBound", "10000000");
options.put("lowerBound", "1");
options.put("fetchSize", "50000");
return options;
当我对此数据框执行操作时,dataframe - > javaRDD - >图
我收到以下错误
java.sql.SQLException: [Simba][ImpalaJDBCDriver](500150) Error setting/closing connection: Fetch Error.
at com.cloudera.impala.hivecommon.api.HS2Client.checkFetchErrors(HS2Client.java:176)
at com.cloudera.impala.hivecommon.dataengine.BackgroundFetcher.getNextBuffer(BackgroundFetcher.java:201)
at com.cloudera.impala.hivecommon.dataengine.HiveJDBCResultSet.moveToNextRow(HiveJDBCResultSet.java:391)
at com.cloudera.impala.jdbc.common.SForwardResultSet.next(SForwardResultSet.java:2910)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.getNext(JDBCRDD.scala:369)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.hasNext(JDBCRDD.scala:498)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply$mcV$sp(PairRDDFunctions.scala:1197)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1197)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1197)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1251)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1205)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1185)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
任何输入? 感谢