Jupyter Notebook + Pyspark-无法加载spark-xml包

时间:2019-05-19 20:04:25

标签: xml apache-spark pyspark jupyter-notebook

from pyspark import SparkContext
sc = SparkContext()
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
from os import environ
environ['PYSPARK_SUBMIT_ARGS'] = '--packages --packages com.databricks:spark-xml_2.12:0.5.0 pyspark-shell' 

ds = sqlContext.read.format('com.databricks.spark.xml').option('rowTag', 'row').load('src/main/resources/Tags.xml')

ds.show()

我将上面的代码放入Jupyter单元中,似乎根本没有加载'com.databricks.spark.xml'-包。我应该怎么做才能将一些xml文件与pyspark一起加载到Jupyter中?我正在使用Manjaro。

错误是:

Py4JJavaError: An error occurred while calling o24.load.
: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark.apache.org/third-party-projects.html

0 个答案:

没有答案