我想使用python计算平均Spark sql吗?

时间:2019-05-16 11:38:18

标签: pyspark apache-spark-sql mean pyspark-sql

显示.count(),但是.sum()出错了,我该怎么办?

代码:

def meanTemperature(df,spark):
    counttemp=spark.sql("SELECT temperature  from washing").count()
    sumtemp=spark.sql("SELECT temperature from washing").sum()
    mean=sumtemp/counttemp
    return mean

错误: AttributeError:“ DataFrame”对象没有属性“ sum”

1 个答案:

答案 0 :(得分:0)

sum()函数在DataFrame中不可用,因此会出现错误。您可以使用以下代码段查找平均值或中位数。

meanTemp = spark.sql("select mean(temperature,0.5) from washing")
return meanTemp.collect()[0][0] 

如果要中值

medianTemp = spark.sql("select percentile_approx(temperature,0.5) from washing")
return medianTemp.collect()[0][0]