from pyspark import SparkContext
sc = SparkContext()
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
from os import environ
environ['PYSPARK_SUBMIT_ARGS'] = '--packages --packages com.databricks:spark-xml_2.12:0.5.0 pyspark-shell'
ds = sqlContext.read.format('com.databricks.spark.xml').option('rowTag', 'row').load('src/main/resources/Tags.xml')
ds.show()
我将上面的代码放入Jupyter单元中,似乎根本没有加载'com.databricks.spark.xml'-包。我应该怎么做才能将一些xml文件与pyspark一起加载到Jupyter中?我正在使用Manjaro。
错误是:
Py4JJavaError: An error occurred while calling o24.load.
: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark.apache.org/third-party-projects.html