通过SQL代码循环Python参数

时间:2019-07-19 20:02:54

标签: python sql for-loop

我需要创建以下可扩展的报告:

query = """
(SELECT
    '02/11/2019' as Week_of,
    media_type,
    campaign,
    count(ad_start_ts) as frequency
FROM usotomayor.digital 
WHERE ds between 20190211 and 20190217
GROUP BY 1,2,3)
UNION ALL
(SELECT
    '02/18/2019' as Week_of,
    media_type,
    campaign,
    count(ad_start_ts) as frequency
FROM usotomayor.digital 
WHERE ds between 20190211 and 20190224
GROUP BY 1,2,3)


"""

#Converting to dataframe
query2 = spark.sql(query).toPandas()
query2

但是,如您所见,如果对于需要合并的每个SQL查询都有很长的日期列表,则无法使此报告具有可伸缩性。

我第一次尝试将日期变量列表循环到SQL脚本中,如下所示:

dfys = ['20190217','20190224']

df2 = ['02/11/2019','02/18/2019']

for i in df2:
    date=i

for j in dfys:
    date2=j

query = f"""
SELECT
    '{date}' as Week_of,
    raw.media_type,
    raw.campaign,
    count(raw.ad_start_ts) as frequency
FROM usotomayor.digital raw 
WHERE raw.ds between 20190211 and {date2}
GROUP BY 1,2,3

"""

#Converting to dataframe
query2 = spark.sql(query).toPandas()
query2

但是,这对我不起作用。我想我需要遍历sql查询本身,但是我不知道该怎么做。有人可以帮我吗?

1 个答案:

答案 0 :(得分:1)

正如评论员所说,“这对我不起作用”不是很具体,所以让我们从指定问题开始。您需要为每对日期执行一个查询,您需要将这些查询作为循环执行并保存结果(或实际上将它们合并,但随后需要更改查询逻辑)。

您可以这样做:

dfys = ['20190217', '20190224']

df2 = ['02/11/2019', '02/18/2019']

query_results = list()
for start_date, end_date in zip(dfys, df2):
    query = f"""
    SELECT
        '{start_date}' as Week_of,
        raw.media_type,
        raw.campaign,
        count(raw.ad_start_ts) as frequency
    FROM usotomayor.digital raw 
    WHERE raw.ds between 20190211 and {end_date}
    GROUP BY 1,2,3

    """
    query_results.append(spark.sql(query).toPandas())

query_results[0]
query_results[1]

现在,您将获得结果列表(query_results)。