在CentOS-7服务器上,我正在运行要使用在同一节点上运行的PySpark客户端访问的mongod服务。但是出现错误: {MongoSocketReadException:过早到达流的结尾}
程序参考:https://docs.mongodb.com/spark-connector/current/python-api/
mongod服务正在使用SSL配置运行,如下所示:
net:
bindIp: 127.0.0.1
port: 27017
ssl:
#mode: preferSSL
mode: requireSSL
PEMKeyFile: '/opt/mongodb_ssl/key_cert.pem'
CAFile: '/opt/mongodb_ssl/caroot.pem'
现在,使用以下命令通过PySpark连接mongod:
#./bin/pyspark --conf "spark.mongodb.input.uri=mongodb://username:password@127.0.0.1/kart.collection?ssl=true&readPreference=primaryPreferred" --conf "spark.mongodb.output.uri=mongodb://username:password@127.0.0.1/kart.collection?ssl=true" --packages org.mongodb.spark:mongo-spark-connector_2.11:2.4.0
通过PySpark终端,尝试将数据写入mongod kart / myCollection数据库:
>>> people = spark.createDataFrame([("Bilbo Baggins", 50), ("Gandalf", 1000), ("Thorin", 195), ("Balin", 178), ("Kili", 77),
... ("Dwalin", 169), ("Oin", 167), ("Gloin", 158), ("Fili", 82), ("Bombur", None)], ["name", "age"])
>>>people.write.format("com.mongodb.spark.sql.DefaultSource").mode("append").save()
[Stage 0:> (0 + 2) / 2]19/05/21 08:19:49 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting to connect. Client view of cluster state is {type=UNKNOWN, servers=[{address=127.0.0.1:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketReadException: Prematurely reached end of stream}}]
at com.mongodb.internal.connection.BaseCluster.getDescription(BaseCluster.java:179)
at com.mongodb.internal.connection.SingleServerCluster.getDescription(SingleServerCluster.java:41)
at com.mongodb.client.internal.MongoClientDelegate.getConnectedClusterDescription(MongoClientDelegate.java:136)
at com.mongodb.client.internal.MongoClientDelegate.createClientSession(MongoClientDelegate.java:94)
at com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.getClientSession(MongoClientDelegate.java:249)
at com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:190)
at com.mongodb.client.internal.MongoCollectionImpl.executeInsertMany(MongoCollectionImpl.java:520)
at com.mongodb.client.internal.MongoCollectionImpl.insertMany(MongoCollectionImpl.java:504)
at com.mongodb.spark.MongoSpark$$anonfun$save$1$$anonfun$apply$1$$anonfun$apply$2.apply(MongoSpark.scala:119)
at com.mongodb.spark.MongoSpark$$anonfun$save$1$$anonfun$apply$1$$anonfun$apply$2.apply(MongoSpark.scala:119)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at com.mongodb.spark.MongoSpark$$anonfun$save$1$$anonfun$apply$1.apply(MongoSpark.scala:119)
at com.mongodb.spark.MongoSpark$$anonfun$save$1$$anonfun$apply$1.apply(MongoSpark.scala:118)
at com.mongodb.spark.MongoConnector$$anonfun$withCollectionDo$1.apply(MongoConnector.scala:186)
at com.mongodb.spark.MongoConnector$$anonfun$withCollectionDo$1.apply(MongoConnector.scala:184)
at com.mongodb.spark.MongoConnector$$anonfun$withDatabaseDo$1.apply(MongoConnector.scala:171)
at com.mongodb.spark.MongoConnector$$anonfun$withDatabaseDo$1.apply(MongoConnector.scala:171)
at com.mongodb.spark.MongoConnector.withMongoClientDo(MongoConnector.scala:154)
at com.mongodb.spark.MongoConnector.withDatabaseDo(MongoConnector.scala:171)
at com.mongodb.spark.MongoConnector.withCollectionDo(MongoConnector.scala:184)
at com.mongodb.spark.MongoSpark$$anonfun$save$1.apply(MongoSpark.scala:118)
at com.mongodb.spark.MongoSpark$$anonfun$save$1.apply(MongoSpark.scala:117)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
如果我将mongod SSL配置更改为PreferredSSL,则可以无问题地写入数据库。
ssl:
mode: preferSSL
但是预期的行为,用户应该从PySpark连接到mongod,以将SSL模式配置为requireSSL来运行分析查询。