Python数据科学新手。
这里我有一个sql server extract,我通过'pyodbc.connect'提取数据,并从SQL服务器通过pd.read_sql(..... SQL查询)读取数据。
这里我的目的是在SQL查询 where 条件中使用列表或向量(下面的示例)。我怎么做的?它让我们不会在内存中获取数百万行。
我想知道如何传递数字列表和字符串列表(两者都有不同的用例)
第一个whare条件字符串:
raw_data2 = {'age1': ['ten','twenty']}
df2 = pd.DataFrame(raw_data2, columns = ['age1'])
第二条件号:
raw_data2 = {'age_num': [10,20,30]}
df3 = pd.DataFrame(raw_data2, columns = ['age_num'])
感谢您的帮助,这会将我们的获取时间缩短至80%
答案 0 :(得分:1)
考虑使用pandas'read_sql
并传递参数以避免类型处理。此外,使用与原始 raw_data 键对应的键保存所有数据帧字典,并避免使用许多单独的数据帧充斥全局环境:
raw_data = {'age1': ['ten','twenty'],
'age_num': [10, 20, 30]}
df_dict = {}
for k, v in raw_data.items():
# BUILD PREPARED STATEMENT WITH PARAM PLACEHOLDERS
where = '{col} IN ({prm})'.format(col=k, prm=", ".join(['?' for _ in v]))
sql = 'SELECT * FROM mytable WHERE {}'.format(where)
print(sql)
# IMPORT INTO DATAFRAME
df_dict[k] = pd.read_sql(sql, conn, params = v)
# OUTPUT TOP ROWS OF EACH DF ELEM
df_dict['age1'].head()
df_dict['age_num'].head()
对于单独的数据框对象:
def build_query(my_dict):
for k, v in my_dict.items():
# BUILD PREPARED STATEMENT WITH PARAM PLACEHOLDERS IN WHERE CLAUSE
where = '{col} IN ({prm})'.format(col=k, prm=", ".join(['?' for _ in v]))
sql = 'SELECT * FROM mytable WHERE {}'.format(where)
return sql
raw_data2 = {'age1': ['ten','twenty']}
# ASSIGNS QUERY
sql = build_query(raw_data2)
# IMPORT TO DATAFRAME PASSING PARAM VALUES
df2 = pd.read_sql(sql, conn, params = raw_data2['age1'])
raw_data3 = {'age_num': [10,20,30]}
# ASSIGNS QUERY
sql = build_query(raw_data3)
# IMPORT TO DATAFRAME PASSING PARAM VALUES
df3 = pd.read_sql(sql, conn, params = raw_data3['age_num'])