使用pandas.io.sql将列值传递给SQL查询的选择

时间:2017-12-07 15:46:41

标签: python sql pandas

我有多个需要运行的SQL查询(通过pandas.io.sql / .read_sql),它们具有非常相似的结构,所以我试图对它们进行参数化。

我想知道是否有办法使用.format传递列值(适用于字符串)。

我的查询(截断以简化此帖子):

sql= '''
SELECT DISTINCT 
    CAST(report_suite AS STRING) AS report_suite, post_pagename,
    COUNT(DISTINCT(CONCAT(post_visid_high,post_visid_low))) AS unique_visitors 
FROM 
    FOO.db
WHERE 
    date_time BETWEEN '{0}' AND '{1}'
    AND report_suite = '{2}'
GROUP BY 
    report_suite, post_pagename
ORDER BY 
    unique_visitors DESC
'''.format(*parameters)

我想做的是,能够参数化COUNT(DISTINCT(CONCAT(post_visid_high, post_visid_low))) as Unique Visitors

以某种方式这样:

COUNT(DISTINCT({3})) as {'4'}

我似乎无法解决的问题是,为了做到这一点,需要将列名称存储为字符串以外的其他内容以避免引号。这有什么好方法吗?

1 个答案:

答案 0 :(得分:2)

考虑以下方法:

sql_dynamic_parms = dict(
  func1='CONCAT(post_visid_high,post_visid_low)',
  name1='unique_visitors'
)

sql= '''
SELECT DISTINCT 
    CAST(report_suite AS STRING) AS report_suite, post_pagename,
    COUNT(DISTINCT({func1})) AS {name1} 
FROM 
    FOO.db
WHERE 
    date_time BETWEEN %(date_from)s AND %(date_to)s
    AND report_suite = %(report_suite)s
GROUP BY 
    report_suite, post_pagename
ORDER BY 
    unique_visitors DESC
'''.format(**sql_dynamic_parms)

params = dict(
  date_from=pd.to_datetime('2017-01-01'),
  date_to=pd.to_datetime('2017-12-01'),
  report_suite=111
)

df = pd.read_sql(sql, conn, params=params)

PS您可能需要阅读PEP-249以了解接受哪种参数占位符