通过pyodbc优化MS SQL插入

时间:2019-04-26 16:52:53

标签: json sql-server python-3.x pyodbc

我正在捕获来自API调用的JSON响应并将其插入到MS SQL数据库中。该API大约有500,000个数据条目,我一次可以查询1000个。

SQL插入查询语句似乎花费的时间太长,我认为它可能与查询串联和字符串转换有关。

SQL列

self.query_headers = "([ac_serial_reg_key],[activity_date], 
    [aircraft_delivery_date],[activity_name],[activity_remark],
    [aircraft_category],[aircraft_financial_status],[aircraft_group],
    [aircraft_model],[aircraft_operational_status],[aircraft_registration_number], [aircraft_serial_number],
    [aircraft_type],[engine_model],[engine_type],[operator_name]
)"

然后通过迭代JSON响应来组合SQL值

self.query_values = ''

for idx, set in enumerate(self.response.json()['results'], start=1):
    self.query_values += "("
    self.query_values += "'"+str(set['aircraft_registration_number'])+"_"+str(set['aircraft_serial_number'])+"',"
    self.query_values += self.validate_date(set['activity_date'])+","
    self.query_values += self.validate_date(set['aircraft_delivery_date'])+","
    self.query_values += "'"+str(set['activity_name'])+"',"
    self.query_values += "'"+str(set['activity_remark'])+"',"
    self.query_values += "'"+str(set['aircraft_category'])+"',"
    self.query_values += "'"+str(set['aircraft_financial_status'])+"',"
    self.query_values += "'"+str(set['aircraft_group'])+"',"
    self.query_values += "'"+str(set['aircraft_model'])+"',"
    self.query_values += "'"+str(set['aircraft_operational_status'])+"',"
    self.query_values += "'"+str(set['aircraft_registration_number'])+"',"
    self.query_values += "'"+str(set['aircraft_serial_number'])+"',"
    self.query_values += "'"+str(set['aircraft_type_lar'])+"',"
    self.query_values += "'"+str(set['engine_model'])+"',"
    self.query_values += "'"+str(set['engine_type'])+"',"
    self.query_values += "'"+str(set['operator_name'])+"'"
    self.query_values += ")"                    

if idx < self.response.json()['results_this_page']:
    self.query_values += ","

如何最好地解析返回的JSON对象以创建单个(或多个)插入语句?

插入声明

self.query = "INSERT INTO "+self.database+" "+self.query_headers+" VALUES"+self.query_values
self.cursor.execute(self.query)
self.cnxn.commit()

1 个答案:

答案 0 :(得分:3)

通过我评论中的链接,类似这样的内容可以快速将数十万个json值插入到表中:

JSON: 
N'[  
       { "id" : 2,"info": { "name": "John", "surname": "Smith" }, "age": 25 },  
       { "id" : 5,"info": { "name": "Jane", "surname": "Smith", "skills": ["SQL", "C#", "Azure"] }, "dob": "2005-11-04T12:00:00" }  
 ]'  

您将参数化SQL脚本,以便您的前端应用程序填充@json变量(抱歉,我不知道这在python中是什么样,但是它与任何参数化的sql相同)。这是脚本

INSERT INTO Person(id, fn, ln, age, dob, skill)
SELECT *  
FROM 
  OPENJSON(@json)  
  WITH (id int 'strict $.id',  
        firstName nvarchar(50) '$.info.name', 
        lastName nvarchar(50) '$.info.surname',  
        age int, 
        dateOfBirth datetime2 '$.dob',
        skills nvarchar(max) '$.info.skills' as json) 
  OUTER APPLY 
    OPENJSON( skills ) --to "recurse" into the skills array
    WITH( skill nvarchar(8) '$' )

重要的是要注意,外在应用将使Jane Smith的人员数据每技能重复一次:

5, Jane, Smith, ..., C#
5, Jane, Smith, ..., SQL
5, Jane, Smith, ..., Azure

也许您的json没有这种结构。如果可以的话,您可以将其打包(不要外用),也可以使用WHERE去除不感兴趣的内容:

INSERT INTO Person(id, fn, ln, age, dob, skill)
SELECT *  
FROM 
  OPENJSON(@json)  
  WITH (
         ...
        skills nvarchar(max) '$.info.skills' as json) 
  OUTER APPLY 
    OPENJSON( skills ) --to "recurse" into the skills array
    WITH( skill nvarchar(8) '$' 
WHERE skill = 'C#'