我在响应对象中有一堆对象,我将其保存到数据库中。逐个对象地执行它非常慢,因为这实际上意味着如果它的30k对象,它将对数据库进行30k提交。
for obj in response['RESULTS']:
_city = City.objects.create(
id=obj['id'],
name=obj['name'],
shortname=obj['shortname'],
location=obj['location'],
region=region_fk
)
_events = Event.objects.get(pk=obj['Event'])
_events.city_set.add(_city)
我实施bulk_create()
的新方法是这样的:
bulk_list = []
for obj in response['RESULTS']:
# get the foreignkey instead of duplicating data
if obj.get('Region'):
region_fk = Region.objects.get(pk=obj['Region'])
bulk_list.append(
City(
id=obj['id'],
name=obj['name'],
shortname=obj['shortname'],
location=obj['location'],
region=region_fk
)
)
bulk_save = City.objects.bulk_create(bulk_list)
虽然这比我之前尝试的速度快很多,但它有问题,现在我不知道如何添加我的M2M关系。
class City(models.Model):
id = models.CharField(primary_key=True, max_length=64)
name = models.CharField(max_length=32)
shortname = models.CharField(max_length=32)
location = models.CharField(max_length=32)
region = models.ForeignKey(max_length=32)
events = models.ManyToManyField(Event)
class Event(models.Model):
id = models.CharField(primary_key=True, max_length=64)
description = models.TextField()
date = models.DateTimeField()
class Region(models.Model):
id = models.IntegerField(primary_key=True)
我已经浏览了stackoverflow并找到了一些例子,但我完全不理解它们。似乎大多数答案都在讨论
bulk_create M2M关系以及through
模型,我不确定那是我在寻找什么。
任何帮助或指示都非常感谢。谢谢。
我跑:
"RESULT": [
{
"City": [
{
"id": "349bc6ab-1c82-46b9-889e-2cc534d5717e",
"name": "Stockholm",
"shortname": "Sthlm",
"location": "Sweden",
"region": [
2
],
"events": [
{
"id": "989b6563-97d2-4b7d-83a2-03c9cc774c21",
"description": "some text",
"date": "2017-06-19T00:00:00"
},
{
"id": "70613514-e569-4af4-b770-a7bc9037ddc2",
"description": "some text",
"date": "2017-06-20T00:00:00"
},
{
"id": "7533c16b-3b3a-4b81-9d1b-af528ec6e52b",
"description": "some text",
"date": "2017-06-22T00:00:00"
},
}
}
]
答案 0 :(得分:0)
取决于。
如果你的M2M关系没有明确的through
模型,那么使用Django ORM的可能解决方案是:
from itertools import groupby
# Create all ``City`` objects (like you did in your second example):
cities = City.objects.bulk_create(
[
City(
id=obj['id'],
name=obj['name'],
shortname=['shortname'],
location=['location'],
region=['region']
) for obj in response['RESULTS']
]
)
# Select all related ``Event`` objects.
events = Event.objects.in_bulk([obj['Event'] for obj in response['RESULTS']])
# Add all related cities to corresponding events:
for event_id, event_cities_raw in groupby(response['RESULTS'], lambda x: x['Event']):
event = events[event_id]
# To avoid DB queries we can gather all cities ids from response
city_ids = [city['id'] for city in event_cities_raw]
# And get saved objects from bulk_create result, which are required for ``add`` method.
event_cities = [city for city in cities if city.pk in city_ids]
event.city_set.add(*event_cities)
1个bulk_create查询,1个in_bulk查询+ 1个查询,用于响应中的每个唯一事件(event.city_set.add
默认执行单个UPDATE查询。)
使用明确的through
模型,应该可以为此模型使用另一个bulk_create,换句话说,将所有event.city_set.add
个查询替换为单个ExplicitThrough.objects.bulk_create
。
当response['RESULTS']
的事件不存在时,您可能需要处理情况,然后您必须使用另一个bulk_create
创建这些对象。
发表评论:
如果response['RESULTS']
中的某些事件在数据库中不存在。
在这种情况下,您可以在bulk_create
查询下执行另一个Event.objects.in_bulk
:
new_events = Event.objects.create_bulk([obj['Event'] for obj in response['RESULTS'] if obj['Event']['id'] not in events])
但是在这里,它取决于response['RESULTS']
中的对象结构。但总的来说,你需要在这里创建缺失的事件。它应该比使用Event.objects.get_or_create
调用更快。