聚合/优化object.save()?

时间:2014-02-25 14:41:37

标签: python django postgresql

我正在使用导入功能,允许用户从选定的csv文件创建django数据库模型。

模型与外键和多对多字段相互关联。 有很多

object.save()

Object.objects.get(...)在我的代码中,我认为这会导致它运行得太慢。

当发生错误(例如完整性错误)时,我需要回滚数据库中的所有更改。所以我正在使用

transaction.atomic 

装饰我的观点,它工作正常。

问题是,我的导入非常慢。解析包含~2000行的文件(可能会在我的数据库中添加大约1000个对象)大约需要3分钟,这太长了。

有没有办法让它更快?我读过关于

的文章
bulk_create

功能,但“它不适用于多对多关系。”

如果这很重要,我正在使用postgresql。

修改 文件结构如下所示:

subject_name
day [A/B] begins_at - ends_at;lecturer_info  

然后多行如:

student_uid;student_info  

好的,这是代码。

def csv_import(market, csv_file):
    lines = [line.strip().decode('utf-8') for line in csv_file.readlines()]
    lines = [line for line in lines if line]
    pattern = re.compile(r'[0-9]+;.+')   

    week_days = {
        'monday': 0,
        .  
        .
        .
    }

    term, subject, lecturer, student = None, None, None, None

    for number, line in enumerate(lines):
        if not ';' in line:
            subject = Subject(subject_id=number, name=line, market=market)
            subject.save()
        elif not pattern.match(line):
            term_info, lecturer_info = line.split(';')  # term_info - 'day begins_at - ends_at', lecturer_info - lecturer
            term_info = term_info.replace(' - ', ' ').split()
            term = Term(term_id=number, subject=subject, day=week_days[term_info[0]], begin_at=term_info[-2],
                        ends_at=term_info[-1])

            if len(term_info) == 4:
                term.week = term_info[1]

            lecturer_info = lecturer_info.rsplit(' ', 1)
            try:
                lecturer = Lecturer.objects.get(first_name=lecturer_info[0], last_name=lecturer_info[1])
            except Lecturer.DoesNotExist:
                lecturer = Lecturer(first_name=lecturer_info[0], last_name=lecturer_info[1])
                lecturer.save()

            term.lecturer = lecturer

            term.save()
        else:
            gradebook_id, student_info = line.split(';')
            student_info = student_info.rsplit(' ', 1)
            try:
                student = TMUser.objects.get(uid=int(gradebook_id))
            except TMUser.DoesNotExist:
                student = TMUser(uid=int(gradebook_id), username='student'+gradebook_id, first_name=student_info[0],
                                 last_name=student_info[1], password=make_password('passwd'), user_group='user')
                student.save()
            student.terms.add(term)
            student.save()

1 个答案:

答案 0 :(得分:0)

这是一些伪代码,向您展示缓存结果的基本含义:

cache = {}

for number, line in enumerate(lines):
   ...
   elif not pattern.match(line):
      ...
      term = Term(term_id=number, subject=subject, ...)

      lecturer_id = (lecturer_info[0], lecturer_info[1])   #first name and last
      if cache[lecturer_id]:
         #retrieve from cache
         lecturer = cache[lecturer_id]
      else:
         try:
            lecturer = Lecturer.objects.get(first_name= lecturer_id[0], last_name= lecturer_id[1])
         except Lecturer.DoesNotExist:
            lecturer = Lecturer(first_name= lecturer_id[0], last_name= lecturer_id[1])
            lecturer.save()
         #add to cache
         cache[lecturer_id] = lecturer

      term.lecturer = lecturer
      term.save()   

      #etc.