我正在使用Django ORM从数百万个项目的数据库中获取数据。但是,计算需要一段时间(40分钟+),而且我不确定如何确定问题的位置。
我使用的模型:
class user_chartConfigurationData(models.Model):
username_chartNum = models.ForeignKey(user_chartConfiguration, related_name='user_chartConfigurationData_username_chartNum')
openedConfig = models.ForeignKey(user_chartConfigurationChartID, related_name='user_chartConfigurationData_user_chartConfigurationChartID')
username_selects = models.CharField(max_length=200)
blockName = models.CharField(max_length=200)
stage = models.CharField(max_length=200)
variable = models.CharField(max_length=200)
condition = models.CharField(max_length=200)
value = models.CharField(max_length=200)
type = models.CharField(max_length=200)
order = models.IntegerField()
def __unicode__(self):
return str(self.username_chartNum)
order = models.IntegerField()
class data_parsed(models.Model):
setid = models.ForeignKey(sett, related_name='data_parsed_setid', primary_key=True)
setid_hash = models.CharField(max_length=100, db_index = True)
block = models.CharField(max_length=2000, db_index = True)
username = models.CharField(max_length=2000, db_index = True)
time = models.IntegerField(db_index = True)
time_string = models.CharField(max_length=200, db_index = True)
def __unicode__(self):
return str(self.setid)
class unique_variables(models.Model):
setid = models.ForeignKey(sett, related_name='unique_variables_setid')
setid_hash = models.CharField(max_length=100, db_index = True)
block = models.CharField(max_length=200, db_index = True)
stage = models.CharField(max_length=200, db_index = True)
variable = models.CharField(max_length=200, db_index = True)
value = models.CharField(max_length=2000, db_index = True)
class Meta:
unique_together = (("setid", "block", "variable", "stage", "value"),)
我正在运行的代码循环遍历data_parsed,其中相关数据在user_chartConfigurationData和unique_variables之间匹配。
#After we get the tab, we will get the configuration data from the config button. We will need the tab ID, which is chartNum, and the actual chart
#That is opened, which is the chartID.
chartIDKey = user_chartConfigurationChartID.objects.get(chartID = chartID)
for i in user_chartConfigurationData.objects.filter(username_chartNum = chartNum, openedConfig = chartIDKey).order_by('order').iterator():
iterator = data_parsed.objects.all().iterator()
#We will loop through parsed objects, and at the same time using the setid (unique for all blocks), which contains multiple
#variables. Using the condition, we can set the variable gte (greater than equal), or lte (less than equal), so that the condition match
#the setid for the data_parsed object, and variable condition
for contents in iterator:
#These are two flags, found is when we already have an entry inside a dictionary that already
#matches the same setid. Meaning they are the same blocks. For example FlowBranch and FlowPure can belong
#to the same block. Hence when we find an entry that matches the same id, we will put it in the same dictionary.
#Added is used when the current item does not map to a previous setid entry in the dictionary. Then we will need
#to add this new entry to the array of dictionary (set_of_pk_values). Otherwise, we will be adding a lot
#of entries that doesn't have any values for variables (because the value was added to another entry inside a dictionary)
found = False
added = False
storeItem = {}
#Initial information for the row
storeItem['block'] = contents.block
storeItem['username'] = contents.username
storeItem['setid'] = contents.setid
storeItem['setid_hash'] = contents.setid_hash
if (i.variable != ""):
for findPrevious in set_of_pk_values:
if(str(contents.setid) == str(findPrevious['setid'])):
try:
items = unique_variables.objects.get(setid = contents.setid, variable = i.variable)
findPrevious[variableName] = items.value
found = True
break
except:
pass
if(found == False):
try:
items = unique_variables.objects.get(setid = contents.setid, variable = i.variable)
storeItem[variableName] = items.value
added = True
except:
pass
if(found == False and added == True):
storeItem['time_string'] = contents.time_string
set_of_pk_values.append(storeItem)
我尝试使用select_related()或prefetch_related(),因为它需要转到unique_variables对象并获取一些数据,但是,它仍然需要很长时间。
有没有更好的方法来解决这个问题?
答案 0 :(得分:2)
当然,请看django_debug_toolbar。它会告诉您执行的查询数量以及持续时间。当我必须优化某些东西=)时,真的无法生存。
PS:执行速度会更慢。
修改:您可能还希望为用于过滤的字段启用db_index,或为index_together启用多个字段。 Ofc,衡量您的更改之间的时间,以确保哪个选项更好。