Question

我有多个需要运行的SQL查询（通过pandas.io.sql / .read_sql），它们具有非常相似的结构，所以我试图对它们进行参数化。

我想知道是否有办法使用.format传递列值（适用于字符串）。

我的查询（截断以简化此帖子）：

sql= '''
SELECT DISTINCT 
    CAST(report_suite AS STRING) AS report_suite, post_pagename,
    COUNT(DISTINCT(CONCAT(post_visid_high,post_visid_low))) AS unique_visitors 
FROM 
    FOO.db
WHERE 
    date_time BETWEEN '{0}' AND '{1}'
    AND report_suite = '{2}'
GROUP BY 
    report_suite, post_pagename
ORDER BY 
    unique_visitors DESC
'''.format(*parameters)

我想做的是，能够参数化COUNT(DISTINCT(CONCAT(post_visid_high, post_visid_low))) as Unique Visitors

以某种方式这样：

COUNT(DISTINCT({3})) as {'4'}

我似乎无法解决的问题是，为了做到这一点，需要将列名称存储为字符串以外的其他内容以避免引号。这有什么好方法吗？

Answer 1

考虑以下方法：

sql_dynamic_parms = dict(
  func1='CONCAT(post_visid_high,post_visid_low)',
  name1='unique_visitors'
)

sql= '''
SELECT DISTINCT 
    CAST(report_suite AS STRING) AS report_suite, post_pagename,
    COUNT(DISTINCT({func1})) AS {name1} 
FROM 
    FOO.db
WHERE 
    date_time BETWEEN %(date_from)s AND %(date_to)s
    AND report_suite = %(report_suite)s
GROUP BY 
    report_suite, post_pagename
ORDER BY 
    unique_visitors DESC
'''.format(**sql_dynamic_parms)

params = dict(
  date_from=pd.to_datetime('2017-01-01'),
  date_to=pd.to_datetime('2017-12-01'),
  report_suite=111
)

df = pd.read_sql(sql, conn, params=params)

PS您可能需要阅读PEP-249以了解接受哪种参数占位符

使用pandas.io.sql将列值传递给SQL查询的选择

1 个答案: