我已经写了一个函数将一些数据从csv文件导入到DB。但是当我调用它时,我可以看到内存消耗增长得非常快,并且由于交换使用导致导入速度变慢。
以下是代码:
import re
import csv
from django.db import transaction
from django.template.defaultfilters import slugify
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "myproject.settings")
from videos.models import *
from django.db import IntegrityError
MAX_RECORDS_BULK = 500
def save_records(info_list, offset):
i = offset * MAX_RECORDS_BULK
with transaction.atomic():
for info in info_list:
try:
with transaction.atomic():
try:
with transaction.atomic():
video = Video(
original_title=info['title'],
duration=info['duration'],
embed=info['embed']
)
video.save()
# Video already exists
except (IntegrityError, ValueError):
print "Video already exists"
continue
for image_url in info['images']:
screenshot = Screenshot(url=image_url, video=video)
screenshot.save()
for tag_title in info['tags']:
try:
with transaction.atomic():
tag = Tag.objects.get(title=tag_title)
except Tag.DoesNotExist:
try:
with transaction.atomic():
tag = Tag(title=tag_title, slug=slugify(tag_title))
tag.save()
except IntegrityError:
print "Couldn't create new tag"
continue
video.tags.add(tag)
video.save()
i += 1
print "Added record %d" % i
except:
continue
def csv_import(filename):
with open(filename, 'rb') as csv_file:
reader = csv.reader(csv_file, delimiter='|')
info_list = []
offset = 0
for row in reader:
info = {}
info['title'] = row[3]
info['embed'] = re.search('(?<=embed/)\w+', row[0]).group(0)
info['images'] = []
info['images'].append(row[1])
info['images'].extend(row[2].split(';'))
info['tags'] = []
info['tags'].extend(row[4].split(';'))
info['duration'] = row[7]
info_list.append(info)
if len(info_list) >= MAX_RECORDS_BULK:
save_records(info_list, offset)
info_list = []
offset += 1
save_records(info_list, offset)
我想我不明白垃圾收集在python中是如何工作的,但也许还有其他问题。如果您可以建议我如何追踪问题,那也会很棒。感谢。
答案 0 :(得分:1)
感谢DanielRoseman,我只需要设置DEBUG=False
。
以下是Django文档中所说的内容:
在使用DEBUG运行时,记住这一点也很重要 在,Django将记住它执行的每个SQL查询。这很有用 当你正在调试时,它会快速消耗内存 生产服务器。