我对以下代码有疑问:
def skewTemperature(cloudantdata,spark):
return spark.sql("""SELECT (1/count(temperature)) * (sum(POW(temperature-%s,3))/pow(%s,3)) as skew from washing""" %(meanTemperature(cloudantdata,spark),sdTemperature(cloudantdata,spark))).first().skew
meanTemperature
和sdTemperature
工作正常但是上面的查询我收到以下错误:
Py4JJavaError: An error occurred while calling o2849.collectToPython.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 315.0 failed 10 times, most recent failure: Lost task 3.9 in stage 315.0 (TID 1532, yp-spark-dal09-env5-0045): java.lang.RuntimeException: Database washing request error: {"error":"too_many_requests","reason":"You've exceeded your current limit of 5 requests per second for query class. Please try later.","class":"query","rate":5
有人知道如何解决这个问题吗?
答案 0 :(得分:1)
该错误表示您超出了查询类的Cloudant API调用阈值,对于您正在使用的服务计划,该阈值似乎为5 /秒。
一种可能的解决方案是通过定义jsonstore.rdd.partitions
配置属性来限制分区数,如以下Spark 2示例所示:
spark = SparkSession\
.builder\
.appName("Cloudant Spark SQL Example in Python using dataframes")\
.config("cloudant.host","ACCOUNT.cloudant.com")\
.config("cloudant.username", "USERNAME")\
.config("cloudant.password","PASSWORD")\
.config("jsonstore.rdd.partitions", 5)\
.getOrCreate()
从5开始,如果错误仍然存在,则向下工作1。此设置基本上限制将向Cloudant发送多少并发请求。如果设置为1无法解决问题,则可能需要考虑升级到具有更大阈值的服务计划。