我想为SAS中的不同宏集迭代相同的代码,然后追加填充在一起的所有表。由于我来自sas背景,我对如何在Pyspark环境中这样做很困惑。任何帮助深表感谢!
示例代码如下:
lastyear_st=201615
lastyear_end=201622
thisyear_st=201715
thisyear_end=201722
customer_spend=sqlContext.sql("""
select a.customer_code,
sum(case when a.week_id between %d and %d then a.spend else 0 end) as spend
from tableA
group by a.card_code
"""
%(lastyear_st,lastyear_end)
(thisyear_st,thisyear_end))
答案 0 :(得分:1)
# macroVars are your start and end values arranged as list of list.
# where each innner list contains start and end value
macroVars = [[201615,201622],[201715, 201722]]
# loop thru list of list ==>
for start,end in macroVars:
# prepare query using the values of start and end
query = "SELECT a.customer_code,Sum(CASE\
WHEN a.week_id BETWEEN {} AND {} \
THEN a.spend \
ELSE 0 END) \
AS spend FROM tablea GROUP BY a.card_code".format(start,end)
# execute query
customer_spend = sqlContext.sql(query)
# depending on your base table setup use appropriate write command for example
customer_spend\
.write.mode('append')\
.parquet(os.path.join(tempfile.mkdtemp(), 'data'))