我正尝试从数据处理应用程序的文本文件转储中,从Django向pgsql数据库添加约40-50k行
以下是我的职责
def populate_backup_db(dumpfile):
sensordata=sensorrecords() **** This is the Model
start_time = time.time()
file= open(dumpfile)
filedata = file.readlines()
endcount=len(filedata)
i=0
imagecount=0
while i<endcount:
lineitem = split_entry(filedata[i])
if (lineitem[0]== "HEADER"):
imagecount=imagecount+1
sensordata.Sensor = lineitem[1]
sensordata.Date1 = lineitem[2]
sensordata.Date2 = lineitem[3]
sensordata.Version = lineitem[4]
sensordata.Proxyclient = lineitem[8]
sensordata.Triggerdate = ctodatetime(lineitem[13])
sensordata.Compression = lineitem[16]
sensordata.Encryption = lineitem[17]
sensordata.Fragments = lineitem[21]
sensordata.Pbit = lineitem[37]
sensordata.BlockIntFT = lineitem[38]
sensordata.OriginServer = lineitem[56]
sensordata.save()
i=i+1
elapsed_time = time.time() - start_time
print(imagecount ,'entries saved to database from ',dumpfile,'. Time Taken is ',elapsed_time,' seconds.')
file.close()
这大约需要2-3分钟才能将所有数据保存到数据库。 此转储文件的大小可能会增加,如果要使用此功能,则可能需要几分钟才能将所有数据保存到数据库
如何从转储文件中提取所有数据,然后一次将所有数据保存到数据库中。
编辑
我看到了一个名为bulk_create()的DJANGO方法
bulk_create()¶
bulk_create(objs, batch_size=None, ignore_conflicts=False)¶
此方法可以高效地将提供的对象列表插入数据库(通常只有1个查询,无论有多少个对象):
>>> Entry.objects.bulk_create([
... Entry(headline='This is a test'),
... Entry(headline='This is only a test'),
... ])
该示例似乎是手动添加条目,我正在使用的功能正在运行一个循环,直到获取所有条目为止,并保存了该过程。
如何在Loop中运行它?是否将sensordata.save()
替换为some_list.append(sensordata)
,最后在循环结束后执行
sensordata.objects.bulk_create(some_list)
编辑2
我编辑了代码以将对象追加到列表中,然后如下所示进行批量更新
def populate_backup_db(dumpfile):
sensordata=sensorrecords() **** This is the Model
datalist =[]
start_time = time.time()
file= open(dumpfile)
filedata = file.readlines()
endcount=len(filedata)
i=0
imagecount=0
while i<endcount:
lineitem = split_entry(filedata[i])
if (lineitem[0]== "HEADER"):
imagecount=imagecount+1
sensordata.Sensor = lineitem[1]
sensordata.Date1 = lineitem[2]
sensordata.Date2 = lineitem[3]
sensordata.Version = lineitem[4]
sensordata.Proxyclient = lineitem[8]
sensordata.Triggerdate = ctodatetime(lineitem[13])
sensordata.Compression = lineitem[16]
sensordata.Encryption = lineitem[17]
sensordata.Fragments = lineitem[21]
sensordata.Pbit = lineitem[37]
sensordata.BlockIntFT = lineitem[38]
sensordata.OriginServer = lineitem[56]
datalist.append(sensordata)
i=i+1
elapsed_time = time.time() - start_time
print(imagecount ,'entries saved to database from ',dumpfile,'. Time Taken is ',elapsed_time,' seconds.')
sensordata.objects.bulk_create(datalist)
file.close()
这会在下面引发错误
跟踪:
File "C:\Python\Python36\lib\site-packages\django\core\handlers\exception.py" in inner
34. response = get_response(request)
File "C:\Python\Python36\lib\site-packages\django\core\handlers\base.py" in _get_response
126. response = self.process_exception_by_middleware(e, request)
File "C:\Python\Python36\lib\site-packages\django\core\handlers\base.py" in _get_response
124. response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "C:\Python\Python36\lib\site-packages\django\contrib\auth\decorators.py" in _wrapped_view
21. return view_func(request, *args, **kwargs)
File "C:\Users\va\eclipse-workspace\prod\home\views.py" in process_data
68. get_backup_data()
File "C:\Users\va\eclipse-workspace\prod\home\process.py" in get_backup_data
8. populate_backup_db('c:\\users\\va\\desktop\\vsp\\backupdata_server.txt')
File "C:\Users\va\eclipse-workspace\prod\home\process.py" in populate_backup_db
122. backupdata.objects.bulk_create(backuplist)
File "C:\Python\Python36\lib\site-packages\django\db\models\manager.py" in __get__
176. raise AttributeError("Manager isn't accessible via %s instances" % cls.__name__)
Exception Type: AttributeError at /process_data/
Exception Value: Manager isn't accessible via backuprecords instances
答案 0 :(得分:0)
是的,您已经回答了您的问题。用append
列出,然后在循环完成后执行bulk_create
。
答案 1 :(得分:0)
好的,我找到了解决方案。批量创建要求您使用原始类名添加记录,而不要使用创建的对象...
def populate_backup_db(dumpfile):
datalist =[]
start_time = time.time()
file= open(dumpfile)
filedata = file.readlines()
endcount=len(filedata)
i=0
imagecount=0
while i<endcount:
lineitem = split_entry(filedata[i])
if (lineitem[0]== "HEADER"):
imagecount=imagecount+1
sensordata = sensorrecords() # initiating object here
sensordata.Sensor = lineitem[1]
sensordata.Date1 = lineitem[2]
sensordata.Date2 = lineitem[3]
sensordata.Version = lineitem[4]
sensordata.Proxyclient = lineitem[8]
sensordata.Triggerdate = ctodatetime(lineitem[13])
sensordata.Compression = lineitem[16]
sensordata.Encryption = lineitem[17]
sensordata.Fragments = lineitem[21]
sensordata.Pbit = lineitem[37]
sensordata.BlockIntFT = lineitem[38]
sensordata.OriginServer = lineitem[56]
datalist.append(sensordata)
i=i+1
elapsed_time = time.time() - start_time
print(imagecount ,'entries saved to database from ',dumpfile,'. Time Taken is ',elapsed_time,' seconds.')
sensorrecords.objects.bulk_create(datalilist) ## This is the line which needed change
file.close()