Spark Context调优

时间:2017-09-18 08:37:13

标签: python apache-spark pyspark

我在Jupyter Environment(ipynb Notebook)中开发Python应用程序。环境开始进入Docker Container。我的客户端通过pyspark cassandra驱动程序连接到Cassandra DB,但执行速度非常慢(可能存在开销)。我找到了,并更改了spark.sql.shuffle.partitions,因为默认情况下此值设置为200.尽管发生了更改,但时间会减少但速度却很慢。那么其他调整火花参数呢?

spark.sql.shuffle.partitions = 4

这是Spark配置的python代码

conf = SparkConf()
conf.setMaster(environment.get('forecast.spark.master'))
conf.setAppName("Consolidate and Evaluate Data")
conf.set("spark.cassandra.connection.port", environment.get('cassandra.port'))
conf.set("spark.cassandra.connection.host", environment.get('cassandra.host'))
#Spark Context
sc = pyspark.SparkContext(conf=conf)
sqlContext = SQLContext(sc)
sqlContext.setConf("spark.sql.shuffle.partitions", environment.get('spark.sql.shuffle.partitions'))

0 个答案:

没有答案