如何在Spark 2.0中加载XML文件?
val rd = spark.read.format("com.databricks.spark.xml").load("C:/Users/kumar/Desktop/d.xml")
我收到错误com.databricks.spark.xml不可用。
java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects
at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:148)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:79)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:79)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:325)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:132)
... 48 elided
答案 0 :(得分:2)
ClassNotFoundException意味着您需要一个胖jar,您可以在build.sbt中包含该包,并通过sbt程序集生成jar。你可以尝试一下。 如果不能工作。将jar添加到$ SPARK_HOME / jars中并试一试。
答案 1 :(得分:1)
或者,您可以将jar文件添加到spark shell中。下载spark-xml_2.10-0.2.0.jar jar文件并复制到spark的类路径中,并使用:cp命令将jar文件添加到spark shell中
:cp spark-xml_2.10-0.2.0.jar
/*
jar file will get imported into the spark shell
now you can use this jar file anywhere in your code inside the spark shell.
*/
val rd = spark.read.format("com.databricks.spark.xml").load("C:/Users/kumar/Desktop/d.xml")