使用pycharm将pyspark 2.3.0与mongodb 3.x连接

时间:2019-01-14 10:26:52

标签: mongodb apache-spark pyspark

我是Spark的新手,我用pycharm中的spark 2.3.0,python 2.7和mongodb 3.6.8设置了本地环境。 我想从spark连接mongoDB并为此目的读取集合数据,我正在使用spark会话读取数据帧中的数据。我的代码是

from pyspark.sql import SparkSession
spark = SparkSession\
.builder\
.appName("PythonSQL")\
.config('spark.jars.packages','org.mongodb.spark:mongo-spark-connector_2.11:2.3.2')\
.getOrCreate()

people1=spark.read\
.format("com.mongodb.spark.sql.DefaultSource")\
.option("uri","mongodb://127.0.0.1:27017/s3Data.hotels")\
.load()
print(people1.show())

我正在创建spark会话并从本地s3Data数据库和Hotels Collection中读取数据,但是当我运行它时,出现如下错误。

Traceback (most recent call last):
File "/home/jarry/PycharmProjects/sparkDev/try.py", line 21, in <module>
.option("uri","mongodb://127.0.0.1:27017/s3Data.hotels")\
File "/home/jarry/spark-2.3.0-bin-hadoop2.7/python/pyspark/sql/readwriter.py", line 172, in load
return self._df(self._jreader.load())
File "/home/jarry/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py", line 1160, in __call__
File "/home/jarry/spark-2.3.0-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/home/jarry/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py", line 320, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o51.load.
: java.lang.NoClassDefFoundError: com/mongodb/ConnectionString
at 
 com.mongodb.spark.config.MongoCompanionConfig$$anonfun$4.apply(MongoCompanionConfig.scala:278)
at com.mongodb.spark.config.MongoCompanionConfig$$anonfun$4.apply(MongoCompanionConfig.scala:278)
at scala.util.Try$.apply(Try.scala:192)
at com.mongodb.spark.config.MongoCompanionConfig$class.connectionString(MongoCompanionConfig.scala:278)
at com.mongodb.spark.config.ReadConfig$.connectionString(ReadConfig.scala:39)
at com.mongodb.spark.config.ReadConfig$.apply(ReadConfig.scala:51)
at com.mongodb.spark.config.ReadConfig$.apply(ReadConfig.scala:39)
at com.mongodb.spark.config.MongoCompanionConfig$class.apply(MongoCompanionConfig.scala:124)
at com.mongodb.spark.config.ReadConfig$.apply(ReadConfig.scala:39)
at com.mongodb.spark.config.MongoCompanionConfig$class.apply(MongoCompanionConfig.scala:113)
at com.mongodb.spark.config.ReadConfig$.apply(ReadConfig.scala:39)
at com.mongodb.spark.sql.DefaultSource.createRelation(DefaultSource.scala:67)
at com.mongodb.spark.sql.DefaultSource.createRelation(DefaultSource.scala:50)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:340)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: com.mongodb.ConnectionString
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 28 more

0 个答案:

没有答案