我正在捕获来自API调用的JSON响应并将其插入到MS SQL数据库中。该API大约有500,000个数据条目,我一次可以查询1000个。
SQL插入查询语句似乎花费的时间太长,我认为它可能与查询串联和字符串转换有关。
SQL列
self.query_headers = "([ac_serial_reg_key],[activity_date],
[aircraft_delivery_date],[activity_name],[activity_remark],
[aircraft_category],[aircraft_financial_status],[aircraft_group],
[aircraft_model],[aircraft_operational_status],[aircraft_registration_number], [aircraft_serial_number],
[aircraft_type],[engine_model],[engine_type],[operator_name]
)"
然后通过迭代JSON响应来组合SQL值
self.query_values = ''
for idx, set in enumerate(self.response.json()['results'], start=1):
self.query_values += "("
self.query_values += "'"+str(set['aircraft_registration_number'])+"_"+str(set['aircraft_serial_number'])+"',"
self.query_values += self.validate_date(set['activity_date'])+","
self.query_values += self.validate_date(set['aircraft_delivery_date'])+","
self.query_values += "'"+str(set['activity_name'])+"',"
self.query_values += "'"+str(set['activity_remark'])+"',"
self.query_values += "'"+str(set['aircraft_category'])+"',"
self.query_values += "'"+str(set['aircraft_financial_status'])+"',"
self.query_values += "'"+str(set['aircraft_group'])+"',"
self.query_values += "'"+str(set['aircraft_model'])+"',"
self.query_values += "'"+str(set['aircraft_operational_status'])+"',"
self.query_values += "'"+str(set['aircraft_registration_number'])+"',"
self.query_values += "'"+str(set['aircraft_serial_number'])+"',"
self.query_values += "'"+str(set['aircraft_type_lar'])+"',"
self.query_values += "'"+str(set['engine_model'])+"',"
self.query_values += "'"+str(set['engine_type'])+"',"
self.query_values += "'"+str(set['operator_name'])+"'"
self.query_values += ")"
if idx < self.response.json()['results_this_page']:
self.query_values += ","
如何最好地解析返回的JSON对象以创建单个(或多个)插入语句?
插入声明
self.query = "INSERT INTO "+self.database+" "+self.query_headers+" VALUES"+self.query_values
self.cursor.execute(self.query)
self.cnxn.commit()
答案 0 :(得分:3)
通过我评论中的链接,类似这样的内容可以快速将数十万个json值插入到表中:
JSON:
N'[
{ "id" : 2,"info": { "name": "John", "surname": "Smith" }, "age": 25 },
{ "id" : 5,"info": { "name": "Jane", "surname": "Smith", "skills": ["SQL", "C#", "Azure"] }, "dob": "2005-11-04T12:00:00" }
]'
您将参数化SQL脚本,以便您的前端应用程序填充@json变量(抱歉,我不知道这在python中是什么样,但是它与任何参数化的sql相同)。这是脚本
INSERT INTO Person(id, fn, ln, age, dob, skill)
SELECT *
FROM
OPENJSON(@json)
WITH (id int 'strict $.id',
firstName nvarchar(50) '$.info.name',
lastName nvarchar(50) '$.info.surname',
age int,
dateOfBirth datetime2 '$.dob',
skills nvarchar(max) '$.info.skills' as json)
OUTER APPLY
OPENJSON( skills ) --to "recurse" into the skills array
WITH( skill nvarchar(8) '$' )
重要的是要注意,外在应用将使Jane Smith的人员数据每技能重复一次:
5, Jane, Smith, ..., C#
5, Jane, Smith, ..., SQL
5, Jane, Smith, ..., Azure
也许您的json没有这种结构。如果可以的话,您可以将其打包(不要外用),也可以使用WHERE去除不感兴趣的内容:
INSERT INTO Person(id, fn, ln, age, dob, skill)
SELECT *
FROM
OPENJSON(@json)
WITH (
...
skills nvarchar(max) '$.info.skills' as json)
OUTER APPLY
OPENJSON( skills ) --to "recurse" into the skills array
WITH( skill nvarchar(8) '$'
WHERE skill = 'C#'