我有一个包含多个列的数据集,我想为每个列应用一些功能。一个例子
列:['source_bytes','source_packets','rate']
功能:['avg','stddev']
结果将是一个移动的窗口,该窗口将生成名为
的新列source_bytes_avg,source_bytes_stddev,source_packets_avg,source_packets_stddev
我已经做好了滚动窗口的准备,但想知道如何有效地将其应用于许多列
w = (Window()
.partitionBy("source_ip")
.orderBy(F.col("timestamp"))
.rangeBetween(-1800, 0))
flows_filtered_v2_df2 = flows_filtered_v2_df.withColumn("timestamp", F.unix_timestamp(F.to_timestamp("start_time")))\
.withColumn("src_bytes_avg_30min", F.avg("source_bytes").over(w))\
.withColumn("src_bytes_std_30min", F.stddev("source_bytes").over(w))