我有一个cron作业,每晚3点从MySQL下载数据。我可以测试这个连接并下载它的工作原理。有时下载部分失败。 (部分下载)如果我尝试重新运行py脚本,它就会吠叫。密钥2的重复输入错误。
我希望能够运行脚本并仅删除前一晚的条目,以便我可以重新运行填充数据库的脚本。还有另外三个表绑在这个表上。如果创建一个删除昨天记录的SQL脚本,django将要做什么?它是否会自动删除对其他表的必要添加内容,还是应该在脚本中执行此操作?
Traceback (most recent call last):
File "manage.py", line 10, in <module>
execute_from_command_line(sys.argv)
File "/usr/local/lib/python2.6/site-packages/django/core/management/__init__.py", line 443, in execute_from_command_line
utility.execute()
File "/usr/local/lib/python2.6/site-packages/django/core/management/__init__.py", line 382, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/usr/local/lib/python2.6/site-packages/django/core/management/base.py", line 196, in run_from_argv
self.execute(*args, **options.__dict__)
File "/usr/local/lib/python2.6/site-packages/django/core/management/base.py", line 232, in execute
output = self.handle(*args, **options)
File "/usr/local/django/grease/greaseboard/management/commands/import_patients.py", line 27, in handle
mrn = row.MRN,
File "/usr/local/lib/python2.6/site-packages/django/db/models/manager.py", line 134, in get_or_create
return self.get_query_set().get_or_create(**kwargs)
File "/usr/local/lib/python2.6/site-packages/django/db/models/query.py", line 449, in get_or_create
obj.save(force_insert=True, using=self.db)
File "/usr/local/lib/python2.6/site-packages/django/db/models/base.py", line 463, in save
self.save_base(using=using, force_insert=force_insert, force_update=force_update)
File "/usr/local/lib/python2.6/site-packages/django/db/models/base.py", line 551, in save_base
result = manager._insert([self], fields=fields, return_id=update_pk, using=using, raw=raw)
File "/usr/local/lib/python2.6/site-packages/django/db/models/manager.py", line 203, in _insert
return insert_query(self.model, objs, fields, **kwargs)
File "/usr/local/lib/python2.6/site-packages/django/db/models/query.py", line 1576, in insert_query
return query.get_compiler(using=using).execute_sql(return_id)
File "/usr/local/lib/python2.6/site-packages/django/db/models/sql/compiler.py", line 910, in execute_sql
cursor.execute(sql, params)
File "/usr/local/lib/python2.6/site-packages/django/db/backends/util.py", line 40, in execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python2.6/site-packages/django/db/backends/mysql/base.py", line 114, in execute
return self.cursor.execute(query, args)
File "/usr/local/lib/python2.6/site-packages/MySQLdb/cursors.py", line 174, in execute
self.errorhandler(self, exc, value)
File "/usr/local/lib/python2.6/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
django.db.utils.IntegrityError: (1062, "Duplicate entry '000xxxxxxxx' for key 2")
答案 0 :(得分:0)
不知道工作有多大,但对于类似的问题,我在保存点旁边使用了交易 - https://docs.djangoproject.com/en/dev/topics/db/transactions/#savepoints
所以给出类似的东西:
transaction.auto_commit(False)
try:
for raw_input in streaming_filelikeobject.readline():
product = do_work(raw_input)
MyTable(**product).save()
except SomeFileIOError:
transaction.rollback()
else:
transaction.commit()
另一个想法是拥有一个batch_id列并在每个批次的开头指定它。
对于非常大的数据集,您可以使用Memcache / Redis之类的东西来管理库存。
transaction.auto_commit(False)
try:
for raw_input in streaming_filelikeobject.readline():
product = do_work(raw_input)
if redis_conn.sadd("my_input_set", product['some_unique_id']):
MyTable(**product).save()
except SomeFileIOError:
transaction.rollback()
else:
transaction.commit()
.sadd()是一个Redis命令,如果redis集中已经存在一个元素,则返回true。
请注意我输入的是我头脑中的这些东西所以方法django事务方法签名可能不具有权威性。