我想使用groupby从Pandas DataFrame列中获取在WHERE子句中包含多个参数的SQL查询字符串输出。最好的方法是什么?
import pandas as pd
df = pd.DataFrame({
'Contact Name':['John Doe','John Doe','Jane Doe','Jim Doe','Jim Doe'],
'Email Address': ['john.doe@gmail.com','john.doe@gmail.com','jane.doe@gmail.com','jim.doe@gmail.com','jim.doe@gmail.com'],
'Contract No':['2851','2852','2853','2854','2855'],
})
从上面的示例中,我需要获取3个不同的SQL查询,如下所示:
SELECT * FROM TABLE WHERE [Contract No] IN ('2851', '2852')
SELECT * FROM TABLE WHERE [Contract No] IN ('2853')
SELECT * FROM TABLE WHERE [Contract No] IN ('2854', '2855')
答案 0 :(得分:1)
让我们use parametrized sql少给黑客entryway into our databases:
sqls = []
args = []
for key, grp in df.groupby(['Contact Name', 'Email Address']):
arg = tuple(grp['Contract No'])
sql = 'SELECT * FROM TABLE WHERE [Contract No] IN ({})'.format(','.join(['%s']*len(arg)))
sqls.append(sql)
args.append(arg)
for sql, arg in zip(sqls, args):
print(sql, arg)
# SELECT * FROM TABLE WHERE [Contract No] IN (%s) ('2853',)
# SELECT * FROM TABLE WHERE [Contract No] IN (%s,%s) ('2854', '2855')
# SELECT * FROM TABLE WHERE [Contract No] IN (%s,%s) ('2851', '2852')
要执行参数化的sql,请使用2-argument form of cursor.execute
:
for sql, arg in zip(sqls, args):
cursor.execute(sql, arg)
答案 1 :(得分:0)
找出解决方案。我只需要与groupby一起使用lambda函数。
import pandas as pd
df1 = pd.DataFrame({
'Contact Name':['John Doe','John Doe','Jane Doe','Jim Doe','Jim Doe'],
'Email Address':['john.doe@gmail.com','john.doe@gmail.com','jane.doe@gmail.com','jim.doe@gmail.com','jim.doe@gmail.com'],
'Contract No':['2851','2852','2853','2854','2855'],
})
df2 = df1.groupby(['Contact Name','Email Address'])['Contract No'].apply(lambda x: ','.join('\'' + x + '\'')).reset_index()
for index, row in df2.iterrows():
print('SELECT * FROM TABLE WHERE [Contract No] IN (' + row['Contract No'] + ')')