django - 加速模型对象的创建

时间:2015-07-11 01:33:59

标签: django

我有几个文件被解析并加载到django 1.7.7数据库中。 以下是它的要点:

# models.py
class Bookstore(models.Model):
    name = models.CharField(max_length=20)
    def __unicode__(self):
        return self.name

class Book(models.Model):
    store = models.ForeignKey(Bookstore)
    title = models.CharField(max_length=20)
    def __unicode__(self):
        return str(self.store)

# the code for writing to the db:
class Command(BaseCommand):
    def handle(self, *args, **options):
        for i in range(100):
            bs = Bookstore.objects.create(name='x')
            for j in range(10):
                print 'creating...'
                Book.objects.create(title='hi', store=bs)

问题是实际内容很大,将文件加载到db需要50分钟。 我怎样才能加快速度呢?

我尝试使用此代码对其进行并行化:

from multiprocessing import Pool
from functools import partial

def create_books(store):
    for j in range(100):
        print 'creating...'
        Book.objects.create(title='hi', store=store)


class Command(BaseCommand):
    def handle(self, *args, **options):
        stores = []
        for i in range(2):
            stores.append(Bookstore.objects.create(name='x'))
        pool = Pool(processes=2)
        func = partial(create_books)
        data = pool.map(func, stores)
        pool.close()
        pool.join()

使用具有线程安全写操作的postgres db。 我收到这个错误:

Traceback (most recent call last):
  File "manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "python2.7/site-packages/django/core/management/__init__.py", line 385, in execute_from_command_line
    utility.execute()
  File "python2.7/site-packages/django/core/management/__init__.py", line 377, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "python2.7/site-packages/django/core/management/base.py", line 288, in run_from_argv
    self.execute(*args, **options.__dict__)
  File "python2.7/site-packages/django/core/management/base.py", line 338, in execute
    output = self.handle(*args, **options)
  File "~django_sample_parallel_create/myapp/myapp/management/commands/parse.py", line 20, in handle
    data = pool.map(func, stores)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get
    raise self._value
django.db.utils.DatabaseError: error with no message from the libpq

我也尝试过bulk_create:

class Command(BaseCommand):
    def handle(self, *args, **options):
        key = 1
        for i in range(100):
            bs = Bookstore.objects.create(name='x')
            books = []
            for j in range(100):
                books.append(Book.objects.create(pk=key, title='hi', store=bs))
                key += 1
            Book.objects.bulk_create(books)

失败了:

django.db.utils.IntegrityError: duplicate key value violates unique constraint "myapp_book_pkey"
DETAIL:  Key (id)=(1) already exists.

我尝试删除所有数据以确保密钥不会发生冲突。还尝试同步postgres键。 它只是失败了,但似乎已经创建了所有对象。

1 个答案:

答案 0 :(得分:3)

尝试替换

books.append(Book.objects.create(...))

books.append(Book(title='hi', store=bs))