我试图批量插入一个非常大的数据集的MySQL数据库,并且喜欢使用django' bulk_create
而忽略重复错误。
样本模型:
class MyModel(models.Model):
my_id=models.IntegerField(primary_key=True)
start_time = models.DateTimeField()
duration = models.IntegerField()
......
description = models.CharField(max_length=250)
到目前为止,我有以下代码(所有模型的通用代码,我传入Model_instance()和[bulk_create对象列表]):
def insert_many(model, my_objects):
# list of ids where pk is unique
in_db_ids = model.__class__.objects.values_list(model.__class__._meta.pk.name)
if not in_db_ids:
# nothing exists, save time and bulk_create
model.__class__.objects.bulk_create(my_objects)
else:
in_db_ids_list = [elem[0] for elem in in_db_ids]
to_insert=[]
for elem in my_objects:
if not elem.pk in in_db_ids_list:
to_insert.append(elem)
if to_insert:
model.__class__.objects.bulk_create(to_insert)
django有没有办法做到这一点,以避免重复?模仿MySQL的insert ignore
会很棒。如果我只是使用bulk_create
(非常快),如果主键重复并且插入停止,我会收到错误。
答案 0 :(得分:6)
这个功能可以做到。
注意:仅当您拥有唯一pk
并且没有其他任何内容unique
时,此功能才有效。
def insert_many(model, my_objects):
# list of ids where pk is unique
in_db_ids = model.__class__.objects.values_list(model.__class__._meta.pk.name)
if not in_db_ids:
# nothing exists, save time and bulk_create
model.__class__.objects.bulk_create(my_objects)
else:
in_db_ids_list = [elem[0] for elem in in_db_ids]
to_insert = []
for elem in my_objects:
if elem.pk not in in_db_ids_list and elem.pk not in to_insert:
to_insert.append(elem)
if to_insert:
model.__class__.objects.bulk_create(to_insert)
如何使用 insert_many(MyModel(), list_of_myModels_defined_but_not_saved)
答案 1 :(得分:3)
ignore_conflicts 参数已添加到 bulk_create (Django 2.2)
,您也可以在https://github.com/django/django/search?q=ignore_conflicts&unscoped_q=ignore_conflicts
中找到它