我已经创建了一个SparkContext和一个Spark全局变量。当我读取ORC文件时,我可以像spark.read.format("orc").load("filepath")
一样简单地读取它们,但是对于avro,即使我尝试这样导入jar,我似乎也做不到:
spark.conf.set("spark.jars.packages",
"file:///projects/apps/lib/spark-avro_2.11-3.2.0.jar")
错误:
and then try to read the avro file. I get an error like so:
Py4JJavaError: An error occurred while calling o65.load.
: org.apache.spark.sql.AnalysisException: Failed to find data source: avro. Please find an Avro package at http://spark.apache.org/third-party-projects.html;
答案 0 :(得分:2)
spark.jars.packages
获取Gradle兼容的坐标:
spark.jars.packages org.apache.spark:spark-avro_2.12:2.4.2
此外,如How to load jar dependenices in IPython Notebook中所述,必须在JVM和SparkSession
/ SparkContext
初始化之前进行设置。
所以您必须: