Spark XML文件加载

时间:2017-03-20 01:43:40

标签: apache-spark apache-spark-sql

如何在Spark 2.0中加载XML文件?

val rd = spark.read.format("com.databricks.spark.xml").load("C:/Users/kumar/Desktop/d.xml")

我收到错误com.databricks.spark.xml不可用。

java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects
  at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:148)
  at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:79)
  at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:79)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:325)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:132)
  ... 48 elided

2 个答案:

答案 0 :(得分:2)

ClassNotFoundException意味着您需要一个胖jar,您可以在build.sbt中包含该包,并通过sbt程序集生成jar。你可以尝试一下。 如果不能工作。将jar添加到$ SPARK_HOME / jars中并试一试。

答案 1 :(得分:1)

或者,您可以将jar文件添加到spark shell中。下载spark-xml_2.10-0.2.0.jar jar文件并复制到spark的类路径中,并使用:cp命令将jar文件添加到spark shell中

:cp spark-xml_2.10-0.2.0.jar  
/*
  jar file will get imported into the spark shell
  now you can use this jar file anywhere in your code inside the spark shell.
*/
val rd = spark.read.format("com.databricks.spark.xml").load("C:/Users/kumar/Desktop/d.xml")