pySpark窗口功能

时间:2019-06-03 18:10:56

标签: python-3.x pyspark

我有一个采用以下方案的数据(df_1)

|-- Column1: string (nullable = true)
|-- Column2: string (nullable = true)
|-- Column3: long (nullable = true)
|-- Column4: double (nullable = true)

df_1的类型为“ pyspark.sql.dataframe.DataFrame”

我想创建一个新列作为Rank,按照定义的窗口(security_window)函数对行进行排名;

import pyspark.sql.functions as F
from pyspark.sql import Window
window=Window.partitionBy(F.col("Column1"),F.col('Column2')).orderBy(F.col("Column3"))).rangeBetween(-20,0)

df_1.withColumn('Rank',F.rank().over(window))

但是,当我将此窗口函数与提到的数据框(df_1)一起使用时, 我面临以下异常作为AnalysisException。有人知道是什么原因吗?

pyspark.sql.utils.AnalysisException: Window Frame RANGE BETWEEN 20 PRECEDING AND CURRENT ROW must match the required frame ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

0 个答案:

没有答案
相关问题