Pyspark Impala jdbc驱动程序不支持此可选功能

时间:2018-11-09 12:53:03

标签: jdbc pyspark spark-streaming cloudera impala

我正在使用pyspark进行火花流传输。我能够正确地流式传输和创建数据框,而不会出现任何问题。我还能够将数据插入到仅用Kafka消息中的总列(72)中只有几(5)个采样列创建的Impala表中。但是,当我创建一个具有适当数据类型和列的表时,类似地,数据框现在具有Kafka流消息中提到的所有列。我收到以下异常。

  

java.sql.SQLFeatureNotSupportedException:[Cloudera] JDBC驱动程序不支持此可选功能。           com.cloudera.impala.exceptions.ExceptionConverter.toSQLException(未知来源)           在com.cloudera.impala.jdbc.common.SPreparedStatement.checkTypeSupported(未知来源)           在com.cloudera.impala.jdbc.common.SPreparedStatement.setNull(未知来源)           在org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils $ .savePartition(JdbcUtils.scala:627)           在org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils $$ anonfun $ saveTable $ 1.apply(JdbcUtils.scala:782)           在org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils $$ anonfun $ saveTable $ 1.apply(JdbcUtils.scala:782)           在org.apache.spark.rdd.RDD $$ anonfun $ foreachPartition $ 1 $ anonfun $ apply $ 29.apply(RDD.scala:926)           在org.apache.spark.rdd.RDD $$ anonfun $ foreachPartition $ 1 $ anonfun $ apply $ 29.apply(RDD.scala:926)           在org.apache.spark.SparkContext $$ anonfun $ runJob $ 5.apply(SparkContext.scala:2064)           在org.apache.spark.SparkContext $$ anonfun $ runJob $ 5.apply(SparkContext.scala:2064)           在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)           在org.apache.spark.scheduler.Task.run(Task.scala:108)           在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:338)           在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)           在java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:624)           在java.lang.Thread.run(Thread.java:748)

我对此进行了很多搜索,但找不到任何解决方案。我也启用了调试日志,但仍然不会提及驱动程序不支持的功能。 任何帮助或适当的指导将不胜感激。 谢谢

版本详细信息:

pyspark:2.2.0 卡夫卡:0.10.2 Cloudera:5.15.0 Cloudera Impala:2.12.0-cdh5.15.0 Cloudera Impala JDBC驱动程序:2.6.4

我使用的代码:

import json
from pyspark import SparkContext,SparkConf,HiveContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
from pyspark.sql import SparkSession,Row
from pyspark.sql.functions import lit
from pyspark.sql.types import *

conf = SparkConf().setAppName("testkafkarecvstream")
sc = SparkContext(conf=conf)
ssc = StreamingContext(sc, 10)
spark = SparkSession.builder.appName("testkafkarecvstream").getOrCreate()
jdbcUrl = "jdbc:impala://hostname:21050/dbName;AuthMech=0;"

fields = [
                 StructField("column_name01", StringType(), True),
                 StructField("column_name02", StringType(), True),
                 StructField("column_name03", DoubleType(), True),
                 StructField("column_name04", StringType(), True),
                 StructField("column_name05", IntegerType(), True),
                 StructField("column_name06", StringType(), True),
                  .....................
                 StructField("column_name72", StringType(), True),
]

schema = StructType(fields)

def make_rows(parts):
    customRow = Row(column_name01=datatype(parts['column_name01']),
                              .....,
                              column_name72=datatype(parts['column_name72'])
                           )
    return customRow


def createDFToParquet(rdd):
    try:
        df = spark.createDataFrame(rdd,schema)
        df.show()df.write.jdbc(jdbcUrl,
                            table="table_name",
                            mode="append",)
    except Exception as e:
        print str(e)


zkNode = "zkNode_name:2181"
topic = "topic_name"

# Reciever method
kvs = KafkaUtils.createStream(ssc,
                              zkNode,
                              "consumer-group-id",
                              {topic:5},
                              {"auto.offset.reset" : "smallest"})

lines = kvs.map(lambda x: x[1])
conv = lines.map(lambda x: json.loads(x))
table = conv.map(makeRows)
table.foreachRDD(createDFToParquet)

table.pprint()

ssc.start()
ssc.awaitTermination()

0 个答案:

没有答案