好吧,我在rangeBetween上遇到问题,我不知道如何达到移动平均值。这是我的问题
u"cannot resolve 'RANGE BETWEEN CAST(-1L AS STRING) FOLLOWING AND CAST(1L AS STRING) FOLLOWING' due to data type mismatch: The data type of the lower bound
请帮助!
from pyspark.sql.functions import mean, min, max
from pyspark.sql import functions as F
from pyspark.sql.window import Window
import numpy as np
from pyspark.sql.types import FloatType
exemple_moving_avg=spark.sql("select DATE_VALEUR, VALEUR, LIBELLE_UNITE, LIBELLE_PARAMETRE, LIBELLE_STATION from tv_cln where LIBELLE_EQUIPEMENT='Pompe de relèvement' and LIBELLE_PARAMETRE = 'Temps de fonctionnement' and LIBELLE_STATION !='/' order by DATE_VALEUR")
w = (Window.orderBy(col("DATE_VALEUR")).rangeBetween(-1, 1))
median_udf = udf(lambda x: float(np.median(x)), FloatType())
exemple_moving_avg.withColumn("list", collect_list("VALEUR").over(w)) \
.withColumn("rolling_median", median_udf("list")).show(truncate = False)