Question

我正在尝试执行一个简单的选择：

spark = SparkSession \
    .builder \
    .config("hive.mapred.supports.subdirectories","TRUE") \
    .config("mapred.input.dir.recursive","TRUE") \
    .appName("sql_test") \
    .enableHiveSupport() \
    .getOrCreate()

spark.sql("SELECT * FROM db.table LIMIT 10").show()

在使用org.openx.data.jsonserde.JsonSerDe的表上，但出现异常：

ERROR hive.log: error in initSerDe: java.lang.ClassNotFoundException Class org.openx.data.jsonserde.JsonSerDe not found
java.lang.ClassNotFoundException: Class org.openx.data.jsonserde.JsonSerDe not found
...

我已经尝试了我在stackoverflow和cloudera论坛上发现的每个建议，但似乎没有任何改变。该jar包含在群集中，此外我还尝试过：

使用--jars /path/json-serde-1.3.8-jar-with-dependencies.jar
使用.config（“ spark.jars”，“ hdfs：//path/json-serde-1.3.8-jar-with-dependencies.jar”）
使用spark.sparkContext.addPyFile（“ hdfs：//path/json-serde-1.3.8-jar-with-dependencies.jar”）
使用--class org.openx.data.jsonserde.JsonSerDe指定类。
其他配置，例如设置--conf spark.executor / driver.classpath.first = true，--conf spark.executor / driver.extraClassPath = hdfs：//path/json-serde-1.3.8 -jar-with-dependencies.jar
使用客户端/群集模式进行部署，将主服务器设置为本地/纱线
在Scala中编译项目，而不是使用Python

但是没什么似乎可行，输出保持不变。使用Hive，我可以毫无问题地访问表，但是似乎无法使用Spark进行任何操作。

任何建议可能是什么问题？谢谢

即使在包含罐子之后也找不到Pyspark类org.openx.data.jsonserde.JsonSerDe

0 个答案: