启用mongoDb的requireSSL配置后,无法从PySpark客户端连接到mongoDb服务器吗?

时间:2019-05-21 03:54:45

标签: python-3.x mongodb pyspark

在CentOS-7服务器上,我正在运行要使用在同一节点上运行的PySpark客户端访问的mongod服务。但是出现错误: {MongoSocketReadException:过早到达流的结尾}

程序参考:https://docs.mongodb.com/spark-connector/current/python-api/

mongod服务正在使用SSL配置运行,如下所示:

net:
  bindIp: 127.0.0.1
  port: 27017
  ssl:
      #mode: preferSSL
      mode: requireSSL
      PEMKeyFile: '/opt/mongodb_ssl/key_cert.pem'
      CAFile: '/opt/mongodb_ssl/caroot.pem' 

现在,使用以下命令通过PySpark连接mongod:

#./bin/pyspark --conf "spark.mongodb.input.uri=mongodb://username:password@127.0.0.1/kart.collection?ssl=true&readPreference=primaryPreferred" --conf "spark.mongodb.output.uri=mongodb://username:password@127.0.0.1/kart.collection?ssl=true" --packages org.mongodb.spark:mongo-spark-connector_2.11:2.4.0

通过PySpark终端,尝试将数据写入mongod kart / myCollection数据库:

>>> people = spark.createDataFrame([("Bilbo Baggins",  50), ("Gandalf", 1000), ("Thorin", 195), ("Balin", 178), ("Kili", 77),
...    ("Dwalin", 169), ("Oin", 167), ("Gloin", 158), ("Fili", 82), ("Bombur", None)], ["name", "age"])

>>>people.write.format("com.mongodb.spark.sql.DefaultSource").mode("append").save()

[Stage 0:>                                                          (0 + 2) / 2]19/05/21 08:19:49 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting to connect. Client view of cluster state is {type=UNKNOWN, servers=[{address=127.0.0.1:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketReadException: Prematurely reached end of stream}}]
    at com.mongodb.internal.connection.BaseCluster.getDescription(BaseCluster.java:179)
    at com.mongodb.internal.connection.SingleServerCluster.getDescription(SingleServerCluster.java:41)
    at com.mongodb.client.internal.MongoClientDelegate.getConnectedClusterDescription(MongoClientDelegate.java:136)
    at com.mongodb.client.internal.MongoClientDelegate.createClientSession(MongoClientDelegate.java:94)
    at com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.getClientSession(MongoClientDelegate.java:249)
    at com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:190)
    at com.mongodb.client.internal.MongoCollectionImpl.executeInsertMany(MongoCollectionImpl.java:520)
    at com.mongodb.client.internal.MongoCollectionImpl.insertMany(MongoCollectionImpl.java:504)
    at com.mongodb.spark.MongoSpark$$anonfun$save$1$$anonfun$apply$1$$anonfun$apply$2.apply(MongoSpark.scala:119)
    at com.mongodb.spark.MongoSpark$$anonfun$save$1$$anonfun$apply$1$$anonfun$apply$2.apply(MongoSpark.scala:119)
    at scala.collection.Iterator$class.foreach(Iterator.scala:891)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
    at com.mongodb.spark.MongoSpark$$anonfun$save$1$$anonfun$apply$1.apply(MongoSpark.scala:119)
    at com.mongodb.spark.MongoSpark$$anonfun$save$1$$anonfun$apply$1.apply(MongoSpark.scala:118)
    at com.mongodb.spark.MongoConnector$$anonfun$withCollectionDo$1.apply(MongoConnector.scala:186)
    at com.mongodb.spark.MongoConnector$$anonfun$withCollectionDo$1.apply(MongoConnector.scala:184)
    at com.mongodb.spark.MongoConnector$$anonfun$withDatabaseDo$1.apply(MongoConnector.scala:171)
    at com.mongodb.spark.MongoConnector$$anonfun$withDatabaseDo$1.apply(MongoConnector.scala:171)
    at com.mongodb.spark.MongoConnector.withMongoClientDo(MongoConnector.scala:154)
    at com.mongodb.spark.MongoConnector.withDatabaseDo(MongoConnector.scala:171)
    at com.mongodb.spark.MongoConnector.withCollectionDo(MongoConnector.scala:184)
    at com.mongodb.spark.MongoSpark$$anonfun$save$1.apply(MongoSpark.scala:118)
    at com.mongodb.spark.MongoSpark$$anonfun$save$1.apply(MongoSpark.scala:117)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:121)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

如果我将mongod SSL配置更改为PreferredSSL,则可以无问题地写入数据库。

ssl:
      mode: preferSSL

但是预期的行为,用户应该从PySpark连接到mongod,以将SSL模式配置为requireSSL来运行分析查询。

0 个答案:

没有答案