Python Django - 将数组与模型进行比较

时间:2014-07-20 17:07:13

标签: python django

我有一个Oracle数据库,我无法在其中添加新表,因此在Django上我创建了一个sqlite数据库,基本上只是为了将Oracle数据库中的项目同步到sqlite。

目前,Oracle数据库中有大约50万个项目。

Oracle数据库中的所有主键都是增量的,但是,不能保证。有时会出现网络问题,打破了Django和Oracle数据库之间的连接,在同步时我会错过一些值。

因此,我在Django中想出了一个模型:

class sequential_missing(models.Model):
    database = models.CharField(max_length=200, primary_key=True)
    row = models.IntegerField(primary_key=True)

基本上,数据库中有一行,Oracle方面缺少这一行,我将比较sqlite数据库中缺少的顺序,并确定缺少的顺序号在Oracle数据库中实际上是空的。因此加快了不实际检查所有缺失的顺序值的过程。

整个功能如下:

def checkMissing(maxValue, databaseObjects, databaseName):
    missingValues = []

    #############SECTION 1##########################
    print "Database:" + databaseName
    print "Checking for Missing Sequential Numbers"
    set_of_pk_values = set(databaseObjects.objects.all().values_list('pk', flat=True))
    set_one_to_max_value = set(xrange(1, maxValue+1))
    missingValues = set_one_to_max_value.difference(set_of_pk_values)
    #############SECTION 1##########################

    #Even though missingValues could be enough, but the problem is that not even Oracle can
    #guarantee the automatic incremented number is sequential, hence we would look up the values
    #we thought it was missing, and remove them from missingValues, which should be faster than
    #checking all of them in the oracle database

    #############SECTION 2##########################
    print "Checking for numbers that are empty, Current Size:" + str(len(missingValues))
    emptyRow = []
    for idx, val in enumerate(missingValues):
        found = False
        for items in sequential_missing.objects.all():
            if(items.row == val and items.database == databaseName):
                found = True
                #print "Database:" + str(items.row) + ", Same as Empty Row:" + str(val)
        if(found == True):
            emptyRow.append(val)
    #############SECTION 2##########################

    #############SECTION 3##########################
    print "Removing empty numbers, Current Size:" + str(len(missingValues)) + ", Empty Row:" + str(len(emptyRow))
    missingValuesCompared = []
    for idx, val in enumerate(missingValues):
        found = False
        for items in emptyRow:
            if(val == items):
                found = True
                #print "Empty Row:" + str(items) + ", same as Missing Values:" + str(val)
        if(found == False):
            missingValuesCompared.append(val)

    print "New Size:" + str(len(missingValuesCompared))
    return missingValuesCompared
    #############SECTION 3##########################

代码分为3个部分:

  1. 找出缺少的顺序值

  2. 检查模型之间的值,如果有任何匹配的,并且是相同的

  3. 创建一个不包含第2部分中包含的行的新数组。

  4. 问题是第2节需要很长时间O(n ^ 2),因为它必须遍历整个数据库并检查该行是否最初是空的。

    有没有更快的方法来做到这一点,同时消耗最少的内存?

    编辑:

    使用ROW IN要好得多,

    setItem = []
    for items in missingValues:
        setItem.append(items)
    print "Items in setItem:" + str(len(setItem))
    
    currentCounter = 0
    currentEndCounter = 500
    counterIncrement = 500
    emptyRowAppend = []
    end = False
    firstPass = False
    while(end == False):
        emptyRow = sequential_missing.objects.filter(database=databaseName, row__in = setItem[currentCounter:currentEndCounter])
        for items in emptyRow:
            emptyRowAppend.append(items.row)
        if(firstPass == True):
            end = True
        if ((currentEndCounter+counterIncrement)>maxValue):
            currentCounter += counterIncrement
            currentEndCounter = maxValue
            firstPass = True
        else:
            currentCounter += counterIncrement
            currentEndCounter += counterIncrement
    
    
    print "Removing empty numbers," + "Empty Row Append Size:" + str(len(emptyRowAppend)) + ", Missing Value Size:" + str(len(missingValues)) + ", Set Item Size:" + str(len(setItem)) +  ", Empty Row:" + str(len(emptyRowAppend))
    missingValuesCompared = []
    for idx, val in enumerate(missingValues):
        found = False
        for items in emptyRowAppend:
            if(val == items):
                found = True
                break
        if(found == False):
            missingValuesCompared.append(val)
    

1 个答案:

答案 0 :(得分:1)

您可以替换此代码

emptyRow = []
for idx, val in enumerate(missingValues):
    found = False
    for items in sequential_missing.objects.all():
        if(items.row == val and items.database == databaseName):
            found = True
            #print "Database:" + str(items.row) + ", Same as Empty Row:" + str(val)
    if(found == True):
        emptyRow.append(val)

emptyRow = sequential_missing.objects.filter(database=databaseName,row__in = missingValues)

以便您向数据库发出单个查询。但是,这将连接必须插入查询中的字符串中的所有missingValues。你应该试试看它是否可行。

否则你应该用val命令missingValues和sequential_missing.objects,这样你就可以在线性时间内找到项目。类似的东西:

sort(missingValues)
val_index = 0
for item in sequential_missing.objects.all().order_by('row'):
  while (val_index < len(missingValues) and item.row>missingValues[val_index]):
    val_index += 1
  if (item.row == missingValues[val_index]): 
    emptyRow.append(item.row)
相关问题