在批量数据插入期间使用django的MemoryError

时间:2012-01-12 02:02:57

标签: mysql django memory

我在Linux机器上运行了一个django应用程序。这个应用程序使用MySQL数据库,其中包含超过40,000个雇主的列表 我有一个django管理命令,它通过这个雇主列表,并为每个雇主下载并插入数据库。一些雇主的雇员不足100人,但有些雇员有数千人。因此,我们正在讨论大量数据。

当我运行此命令时,一切似乎都工作正常,直到某个时候(大约一个小时)我得到一个MemoryError。我无法弄清楚我在哪里泄漏记忆。这是我的代码:

import urllib3

class API:
    @staticmethod
    def getEmployees(employer):

        employees = []

        url = 'someapiurl'
        http_pool = urllib3.connection_from_url(url)
        req = http_pool.get_url(url)

        #this parsing takes between 0.1 to 5 seconds, depending on the size of the response
        doc = xml.dom.minidom.parseString(req.data)

        nodes = doc.getElementsByTagName(MATCHING_ELEMENTS)

        for node in nodes:
            employee = Employee()
            employee.createDataFromNode(node)
            employees.append(employee)

        return employee


class SomeClass:
    #only commit on success - this is to prevent commits from happening every single time save() is called.  

    @transcation.commit_on_success
    @staticmethod
    def createAll():
        #there are over 40,000 employers
        for employer in Employers.objects.all():
            SomeClass.createEmployeesForEmployer(employer)

    @staticmethod
    def createEmployeesForEmployer(employer):
        employees = API.getEmployees(employer)
        for employee in employees:
                employee.save()

这可能是什么问题?谢谢!

0 个答案:

没有答案