我正在使用Spark 2.3.0开发CentOS 7
我将PySpark与jupyter链接在一起,但是在尝试读取xml文件时出现错误
df=pyspark.SQLContext(sc).read.format('com.databricks.spark.xml').options(rowTag='books').load('xm.xml')
Py4JJavaError Traceback (most recent call last)
<ipython-input-13-c0ea09e4b676> in <module>()
----> 1 df = pyspark.SQLContext(sc).read.format('com.databricks.spark.xml').options(rowTag='books').load('xm.xml')
/usr/lib/spark/python/pyspark/sql/readwriter.pyc in load(self, path, format,
schema, **options)
164 self.options(**options)
165 if isinstance(path, basestring):
--> 166 return self._df(self._jreader.load(path))
167 elif path is not None:
168 if type(path) != list:
...
Py4JJavaError: An error occurred while calling o61.load.
: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:635)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:190)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: <br>
如何添加com.databriks与我一起工作?