如何计算移动平均

时间:2019-06-14 16:43:36

标签: python apache-spark pyspark

好吧,我在rangeBetween上遇到问题,我不知道如何达到移动平均值。这是我的问题

u"cannot resolve 'RANGE BETWEEN CAST(-1L AS STRING) FOLLOWING AND CAST(1L AS STRING) FOLLOWING' due to data type mismatch: The data type of the lower bound 

请帮助!

from pyspark.sql.functions import mean, min, max
from pyspark.sql import functions as F
from pyspark.sql.window import Window
import numpy as np 
from pyspark.sql.types import FloatType
exemple_moving_avg=spark.sql("select DATE_VALEUR, VALEUR, LIBELLE_UNITE, LIBELLE_PARAMETRE, LIBELLE_STATION  from tv_cln where LIBELLE_EQUIPEMENT='Pompe de relèvement' and LIBELLE_PARAMETRE = 'Temps de fonctionnement' and LIBELLE_STATION !='/' order by DATE_VALEUR")

w = (Window.orderBy(col("DATE_VALEUR")).rangeBetween(-1, 1))
median_udf = udf(lambda x: float(np.median(x)), FloatType())
exemple_moving_avg.withColumn("list", collect_list("VALEUR").over(w)) \
 .withColumn("rolling_median", median_udf("list")).show(truncate = False)

0 个答案:

没有答案