我需要在Spark SQL中执行“ sumproduct”。求和积的第一个输入必须是介于0到5之间的固定范围,但这可以做到吗?
#create a df with some random data:
import pandas as pd; import random
cols = ['y_vals']
df = pd.DataFrame(columns=cols)
for i in range(0,20):
df.loc[i,'y_vals'] = random.uniform(0,1)
sparkdf = spark.createDataFrame(df)
sparkdf.createOrReplaceTempView('tempsql')
#do the sql sumproduct (the "[0,1,2,3,4,5]" is what I need to get working):
df_sql = sqlContext.sql("""
select y_vals,
(sum([0,1,2,3,4,5]*y_vals) over (rows between 6 preceding and 1 preceding)) as sumprod_6m
from tempsql
""")