我有多个需要运行的SQL查询(通过pandas.io.sql / .read_sql),它们具有非常相似的结构,所以我试图对它们进行参数化。
我想知道是否有办法使用.format传递列值(适用于字符串)。
我的查询(截断以简化此帖子):
sql= '''
SELECT DISTINCT
CAST(report_suite AS STRING) AS report_suite, post_pagename,
COUNT(DISTINCT(CONCAT(post_visid_high,post_visid_low))) AS unique_visitors
FROM
FOO.db
WHERE
date_time BETWEEN '{0}' AND '{1}'
AND report_suite = '{2}'
GROUP BY
report_suite, post_pagename
ORDER BY
unique_visitors DESC
'''.format(*parameters)
我想做的是,能够参数化COUNT(DISTINCT(CONCAT(post_visid_high, post_visid_low))) as Unique Visitors
以某种方式这样:
COUNT(DISTINCT({3})) as {'4'}
我似乎无法解决的问题是,为了做到这一点,需要将列名称存储为字符串以外的其他内容以避免引号。这有什么好方法吗?
答案 0 :(得分:2)
考虑以下方法:
sql_dynamic_parms = dict(
func1='CONCAT(post_visid_high,post_visid_low)',
name1='unique_visitors'
)
sql= '''
SELECT DISTINCT
CAST(report_suite AS STRING) AS report_suite, post_pagename,
COUNT(DISTINCT({func1})) AS {name1}
FROM
FOO.db
WHERE
date_time BETWEEN %(date_from)s AND %(date_to)s
AND report_suite = %(report_suite)s
GROUP BY
report_suite, post_pagename
ORDER BY
unique_visitors DESC
'''.format(**sql_dynamic_parms)
params = dict(
date_from=pd.to_datetime('2017-01-01'),
date_to=pd.to_datetime('2017-12-01'),
report_suite=111
)
df = pd.read_sql(sql, conn, params=params)
PS您可能需要阅读PEP-249以了解接受哪种参数占位符