我在Linux机器上运行了一个django应用程序。这个应用程序使用MySQL数据库,其中包含超过40,000个雇主的列表 我有一个django管理命令,它通过这个雇主列表,并为每个雇主下载并插入数据库。一些雇主的雇员不足100人,但有些雇员有数千人。因此,我们正在讨论大量数据。
当我运行此命令时,一切似乎都工作正常,直到某个时候(大约一个小时)我得到一个MemoryError。我无法弄清楚我在哪里泄漏记忆。这是我的代码:
import urllib3
class API:
@staticmethod
def getEmployees(employer):
employees = []
url = 'someapiurl'
http_pool = urllib3.connection_from_url(url)
req = http_pool.get_url(url)
#this parsing takes between 0.1 to 5 seconds, depending on the size of the response
doc = xml.dom.minidom.parseString(req.data)
nodes = doc.getElementsByTagName(MATCHING_ELEMENTS)
for node in nodes:
employee = Employee()
employee.createDataFromNode(node)
employees.append(employee)
return employee
class SomeClass:
#only commit on success - this is to prevent commits from happening every single time save() is called.
@transcation.commit_on_success
@staticmethod
def createAll():
#there are over 40,000 employers
for employer in Employers.objects.all():
SomeClass.createEmployeesForEmployer(employer)
@staticmethod
def createEmployeesForEmployer(employer):
employees = API.getEmployees(employer)
for employee in employees:
employee.save()
这可能是什么问题?谢谢!