Question

我在响应对象中有一堆对象，我将其保存到数据库中。逐个对象地执行它非常慢，因为这实际上意味着如果它的30k对象，它将对数据库进行30k提交。

示例1：

for obj in response['RESULTS']:

    _city = City.objects.create(
        id=obj['id'],
        name=obj['name'],
        shortname=obj['shortname'],
        location=obj['location'],
        region=region_fk
    )

    _events = Event.objects.get(pk=obj['Event'])
    _events.city_set.add(_city)

我实施bulk_create()的新方法是这样的：

示例2：

bulk_list = []

for obj in response['RESULTS']:

    # get the foreignkey instead of duplicating data

    if obj.get('Region'):
        region_fk = Region.objects.get(pk=obj['Region'])      

    bulk_list.append(
        City(
            id=obj['id'],
            name=obj['name'],
            shortname=obj['shortname'],
            location=obj['location'],
            region=region_fk
        )
    )

bulk_save = City.objects.bulk_create(bulk_list)

虽然这比我之前尝试的速度快很多，但它有问题，现在我不知道如何添加我的M2M关系。

models.py

class City(models.Model):

    id = models.CharField(primary_key=True, max_length=64)
    name = models.CharField(max_length=32)
    shortname = models.CharField(max_length=32)
    location = models.CharField(max_length=32)
    region = models.ForeignKey(max_length=32)
    events = models.ManyToManyField(Event)


class Event(models.Model):

    id = models.CharField(primary_key=True, max_length=64)
    description = models.TextField()
    date = models.DateTimeField()

class Region(models.Model):

    id = models.IntegerField(primary_key=True)

问题

我已经浏览了stackoverflow并找到了一些例子，但我完全不理解它们。似乎大多数答案都在讨论 bulk_create M2M关系以及through模型，我不确定那是我在寻找什么。

如何添加M2M关系？
请分解，以便我能理解，我想学习： - ）

任何帮助或指示都非常感谢。谢谢。

其他信息

我跑：

的PostgreSQL
的django == 1.11

Django关于此主题的文档

https://docs.djangoproject.com/en/1.11/ref/models/querysets/#bulk-create

响应示例：

"RESULT": [
  {
    "City": [
      {
        "id": "349bc6ab-1c82-46b9-889e-2cc534d5717e",
        "name": "Stockholm",
        "shortname": "Sthlm",
        "location": "Sweden",
        "region": [
          2
        ],
        "events": [
          {
            "id": "989b6563-97d2-4b7d-83a2-03c9cc774c21",
            "description": "some text",
            "date": "2017-06-19T00:00:00"
          },
          {
            "id": "70613514-e569-4af4-b770-a7bc9037ddc2",
            "description": "some text",
            "date": "2017-06-20T00:00:00"
          },
            {
            "id": "7533c16b-3b3a-4b81-9d1b-af528ec6e52b",
            "description": "some text",
            "date": "2017-06-22T00:00:00"
          },
      }
  }
]

Answer 1

取决于。

如果你的M2M关系没有明确的through模型，那么使用Django ORM的可能解决方案是：

from itertools import groupby

# Create all ``City`` objects (like you did in your second example):
cities = City.objects.bulk_create(
    [
        City(
            id=obj['id'],
            name=obj['name'],
            shortname=['shortname'],
            location=['location'],
            region=['region']
        ) for obj in response['RESULTS']
    ]
)

# Select all related ``Event`` objects.
events = Event.objects.in_bulk([obj['Event'] for obj in response['RESULTS']])

# Add all related cities to corresponding events:
for event_id, event_cities_raw in groupby(response['RESULTS'], lambda x: x['Event']):
    event = events[event_id]
    # To avoid DB queries we can gather all cities ids from response
    city_ids = [city['id'] for city in event_cities_raw]
    # And get saved objects from bulk_create result, which are required for ``add`` method.
    event_cities = [city for city in cities if city.pk in city_ids]
    event.city_set.add(*event_cities)

1个bulk_create查询，1个in_bulk查询+ 1个查询，用于响应中的每个唯一事件（event.city_set.add默认执行单个UPDATE查询。）

使用明确的through模型，应该可以为此模型使用另一个bulk_create，换句话说，将所有event.city_set.add个查询替换为单个ExplicitThrough.objects.bulk_create。

当response['RESULTS']的事件不存在时，您可能需要处理情况，然后您必须使用另一个bulk_create创建这些对象。

发表评论：

如果response['RESULTS']中的某些事件在数据库中不存在。在这种情况下，您可以在bulk_create查询下执行另一个Event.objects.in_bulk：

new_events = Event.objects.create_bulk([obj['Event'] for obj in response['RESULTS'] if obj['Event']['id'] not in events])

但是在这里，它取决于response['RESULTS']中的对象结构。但总的来说，你需要在这里创建缺失的事件。它应该比使用Event.objects.get_or_create调用更快。

使用django bulk_create与M2M关系