应用错误收集

要提高从JDBC（Postgres）读取数据的Spark性能，建议使用分区列，lowerBound和upperBound。此外，许多建议增加fetchSize（默认值为10）。建议使用许多非Apache文档，如下所示：

df = spark.read.jdbc(url=url, table="table", numPartitions=50, column="some_Column", lowerBound=1, upperBound=10000, fetchSize = 5000, properties=properties)

但是，fetchSize抛出错误：

TypeError: jdbc() got an unexpected keyword argument 'fetchSize'

有人可以建议如何在Spark 2.3.1中设置fetchSize或任何其他方式来提高读取速度吗？

如何提高JDBC，esp postgres的Spark 2.3.1读取速度？

0 个答案: