py4j.protocol.Py4JJavaError:调用o788.save时发生错误。 :com.mongodb.MongoTimeoutException,WritableServerSelector

时间:2019-10-30 11:59:21

标签: mongodb apache-spark pyspark pyspark-dataframes

Pyspark版本:2.4.4 MongoDB版本:4.2.0 内存:64GB CPU核数:32 运行脚本:spark-submit --executor-memory 8G --driver-memory 8G --packages org.mongodb.spark:mongo-spark-connector_2.11:2.3.1 demographic.py

运行代码时出现错误: “ py4j.protocol.Py4JJavaError:调用o764.save时发生错误。 :com.mongodb.MongoTimeoutException:等待30000毫秒后等待与WritableServerSelector匹配的服务器。群集状态的客户端视图为{type = REPLICA_SET,服务器= [{address = 172。。*:27017,type = REPLICA_SET_SECONDARY,roundTripTime = 34.3 ms,state = CONNECTED}]“

我正在尝试从一个具有身份验证的副本服务器读取MongoDB集合,并且可以使用以下命令从该服务器读取内容:

df_ipapp = spark.read.format('com.mongodb.spark.sql.DefaultSource').option('uri', '{}/{}.IpAppointment?authSource={}'.format(mongo_url, mongo_db,auth_source)).load()

,并且工作正常。但是在处理完此数据帧后,我使用以下命令将该数据帧写入另一个MongoDB,该MongoDB在我处理的本地没有身份验证     df.write.format('com.mongodb.spark.sql.DefaultSource')。mode('overwrite')。option('uri','{} / {}。demographic'.format(mongo_final_url,mongo_final_db))。 save()

每当我在这里遇到错误

  File "/home/svr_data_analytic/hmis-analytics-data-processing/src/main/python/sales/demographic.py", line 297, in save_n_rename
    .option('uri', '{}/{}.demographic'.format(mongo_url, mongo_final_db)).save()
  File "/home/svr_data_analytic/spark/spark-2.4.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 736, in save
  File "/home/svr_data_analytic/spark/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/home/svr_data_analytic/spark/spark-2.4.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "/home/svr_data_analytic/spark/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o788.save.
: com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting for a server that matches WritableServerSelector. Client view of cluster state is {type=REPLICA_SET, servers=[{address=172.*.*.*:27017, type=REPLICA_SET_SECONDARY, roundTripTime=0.8 ms, state=CONNECTED}]

从副本服务器读取:

df_bills = spark.read.format('com.mongodb.spark.sql.DefaultSource').option('uri', '{}/{}.Bills?authSource={}'.format(mongo_url, mongo_db, auth_source)).load()

写入mongodb:

df.write.format('com.mongodb.spark.sql.DefaultSource').mode('overwrite').option('uri', '{}/{}.demographic'.format(mongo_final_url, mongo_final_db)).save()

我想从具有身份验证的副本服务器MondoDb中读取并处理数据帧并将其写入本地MongoDB 预先感谢

0 个答案:

没有答案