我目前正在遍历json响应,并逐行插入每一行。
即使插入几千行数据,这也非常慢。
插入数据最有效的方法是什么?
这是我的代码。
from module import usr, pwd, acct, db, schem, api_key
import snowflake.connector
import datetime
end_point = 'users'
def snowflake_connect():
global cursor, mydb
mydb = snowflake.connector.connect(
user=usr,
password=pwd,
account=acct,
database=db,
schema=schem,
)
def snowflake_insert(id, activated, name):
global cursor
snowflake_connect()
cursor = mydb.cursor()
sql_insert_query = """ INSERT INTO USERS(ID, ACTIVATED, NAME) VALUES (%s, %s, %s)"""
insert_tuple = (id, activated, name)
cursor.execute(sql_insert_query, insert_tuple)
return cursor
def get_users():
url = 'https://company.pipedrive.com/v1/{}?&api_token={}'.format(end_point,api_key)
response = requests.request("GET", url).json()
read_users(response)
def read_users(response):
for data in response['data']:
id = data['id']
activated = data['activated']
name = data['name']
snowflake_insert(id, activated, name)
if __name__ == "__main__":
snowflake_truncate()
get_users()
cursor.close()
答案 0 :(得分:2)
others in comments指出,要获得最高的效率(尤其是对于连续加载),请直接将格式化的数据文件加载到Snowflake中,而不要使用INSERT
语句作为最佳实践。
但是,描述中的代码也可以进一步改进,以最小化每个插入行创建的开销。一些主要观察结果:
connection
对象,这是不必要的。INSERT
语句,因此cursor
对象can be reused。INSERT
语句using the multi-value feature可以发送多个值,也可以使用cursor.executemany(…)
以更简单的方式表示。qmark (?)
参数格式设置为avoid potential SQL injection。修改后的代码版本:
from module import usr, pwd, acct, db, schem, api_key
import snowflake.connector
import datetime
end_point = 'users'
MYDB = None
def snowflake_connect():
if MYDB is None:
MYDB = snowflake.connector.connect(
user=usr,
password=pwd,
account=acct,
database=db,
schema=schem,
)
def snowflake_insert_all(rows):
snowflake_connect()
cursor = MYDB.cursor()
sql_insert_query = "INSERT INTO USERS(ID, ACTIVATED, NAME) VALUES (?, ?, ?)"
cursor.executemany(sql_insert_query, rows)
return cursor
def get_users():
url = 'https://company.pipedrive.com/v1/{}?&api_token={}'.format(end_point,api_key)
response = requests.request("GET", url).json()
read_users(response)
def read_users(response):
#
all_data = [(data['id'], data['activated'], data['name']) for data in response['data']]
snowflake_insert_all(all_data)
if __name__ == "__main__":
snowflake_truncate()
get_users()
if MYDB is not None:
MYDB.close()
注意:在这里,我只专注于改进Snowflake和DB-API交互部分,但总的来说还有其他错误(变量和方法命名,不必要使用全局变量,资源处理等)。 )的脚本编写方式,如果您想进一步改善程序,可以使用Code Review的帮助。