我在Azure Databricks中遇到这种错误,因为我想将数据帧写入mongoDB实例(不是Atlas,而是Azure中的kubernetes无状态群集,可以通过IP进行访问。)
我可以通过mongo Shell访问我的mongoDB,在那里一切似乎都很好。
我将我的Spark集群配置设置为
spark.mongodb.input.uri mongodb://<IP>:<Port>?replicaSet=MainRepSet
spark.mongodb.output.uri mongodb://<IP>:<Port>?replicaSet=MainRepSet
使用pyspark,Databricks 5.4(包括Apache Spark 2.4.3,Scala 2.11),Kubernetes 1.12.8上的mongoDB版本3.4.21。在Databricks中,我安装了mongodb.spark:mongo-spark-connector_2.11:2.3.1猜猜它与mongoDB无关,而是连接器中缺少参数。
我的最小应用程序(在Azure Databrick-notebook中运行):
from pyspark.sql import SparkSession
my_spark = SparkSession.builder.appName("myApp").getOrCreate()
people = spark.createDataFrame([("Bilbo Baggins", 50), ("Gandalf", 1000), ("Thorin", 195), ("Balin", 178), ("Kili", 77),
("Dwalin", 169), ("Oin", 167), ("Gloin", 158), ("Fili", 82), ("Bombur", None)], ["name", "age"])
people.write.format("com.mongodb.spark.sql.DefaultSource").option("database", "test").option("collection", "test").mode("append").save()
完全错误消息:
IllegalArgumentException Traceback (most recent call last)
<command-2936697920073069> in <module>()
----> 1 people.write.format("com.mongodb.spark.sql.DefaultSource").option("database", "test").option("collection", "test").mode("append").save()
/databricks/spark/python/pyspark/sql/readwriter.py in save(self, path, format, mode, partitionBy, **options)
730 self.format(format)
731 if path is None:
--> 732 self._jwrite.save()
733 else:
734 self._jwrite.save(path)
/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in __call__(self, *args)
1255 answer = self.gateway_client.send_command(command)
1256 return_value = get_return_value(
-> 1257 answer, self.gateway_client, self.target_id, self.name)
1258
1259 for temp_arg in temp_args:
/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
77 raise QueryExecutionException(s.split(': ', 1)[1], stackTrace)
78 if s.startswith('java.lang.IllegalArgumentException: '):
---> 79 raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
80 raise
81 return deco