我无法使用spark-avro库读取spark文件。以下是我采取的步骤:
spark-shell --jars avro/spark-avro_2.10-0.1.jar
执行git自述文件中给出的命令:
import com.databricks.spark.avro._
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
val episodes = sqlContext.avroFile("episodes.avro")
操作sqlContext.avroFile(" episodes.avro")失败,并显示以下错误:
scala> val episodes = sqlContext.avroFile("episodes.avro")
java.lang.IncompatibleClassChangeError: class com.databricks.spark.avro.AvroRelation has interface org.apache.spark.sql.sources.TableScan as super class
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
答案 0 :(得分:6)
我的坏。自述文件明确指出:
Versions
Spark changed how it reads / writes data in 1.4, so please use the correct version of this dedicated for your spark version
1.3 -> 1.0.0
1.4+ -> 1.1.0-SNAPSHOT
我使用了spark:1.3.1
和spark-avro:1.1.0
。当我使用spark-avro:1.0.0
时,它起作用了。
答案 1 :(得分:0)
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder()
.appName(appName).master(master).getOrCreate()
val sqlContext = spark.sqlContext
val episodes = sqlContext.read.format("com.databricks.spark.avro")
.option("header","true")
.option("inferSchema","true")
.load("episodes.avro")
episodes.show(10)
答案 2 :(得分:0)
由于spark-avro模块是外部模块,因此DataFrameReader或DataFrameWriter中没有.avro API。
要以Avro格式加载/保存数据,您需要将数据源选项格式指定为avro。
示例:
val usersDF = spark.read.format("avro").load("examples/src/main/resources/users.avro")
usersDF.select("name", "favorite_color").write.format("avro").save("namesAndFavColors.avro")