我正在使用Django 1.9。我有一个Django表,表示按月组织的特定度量值,原始值和百分位数:
class MeasureValue(models.Model):
org = models.ForeignKey(Org, null=True, blank=True)
month = models.DateField()
calc_value = models.FloatField(null=True, blank=True)
percentile = models.FloatField(null=True, blank=True)
每月通常有10,000左右。我的问题是我是否可以加快在模型上设置值的过程。
目前,我通过使用Django过滤器查询检索一个月的所有度量值,将其转换为pandas数据帧,然后使用scipy的rankdata
来设置等级和百分位来计算百分位数。我这样做是因为pandas和rankdata
是高效的,能够忽略空值,能够以我想要的方式处理重复值,所以我对这种方法感到满意:
records = MeasureValue.objects.filter(month=month).values()
df = pd.DataFrame.from_records(records)
// use calc_value to set percentile on each row, using scipy's rankdata
但是,我需要从数据框中检索每个百分位值,并将其重新设置到模型实例上。现在我通过迭代数据帧的行并更新每个实例来实现这一点:
for i, row in df.iterrows():
mv = MeasureValue.objects.get(org=row.org, month=month)
if (row.percentile is None) or np.isnan(row.percentile):
row.percentile = None
mv.percentile = row.percentile
mv.save()
这不足为奇。是否有任何有效的Django方法来加速它,通过单个数据库写入而不是数万个?我有checked the documentation,但看不到一个。
答案 0 :(得分:15)
原子事务可以减少在循环中花费的时间:
from django.db import transaction
with transaction.atomic():
for i, row in df.iterrows():
mv = MeasureValue.objects.get(org=row.org, month=month)
if (row.percentile is None) or np.isnan(row.percentile):
# if it's already None, why set it to None?
row.percentile = None
mv.percentile = row.percentile
mv.save()
Django的默认行为是在自动提交模式下运行。除非事务处于活动状态,否则每个查询都会立即提交到数据库。
通过使用with transaction.atomic()
,所有插入都被分组到一个事务中。提交事务所需的时间在所有随附的insert语句中分摊,因此每个insert语句的时间大大减少。
答案 1 :(得分:1)
从Django 2.2开始,您可以使用bulk_update()
queryset方法来有效地更新所提供的模型实例上的给定字段,通常使用一个查询:
Case
在旧版本的Django中,您可以将update()
与When
/ from django.db.models import Case, When
Entry.objects.filter(
pk__in=headlines # `headlines` is a pk -> headline mapping
).update(
headline=Case(*[When(pk=entry_pk, then=headline)
for entry_pk, headline in headlines.items()]))
配合使用,例如:
# Just add types once. There is no need to add the types in each function.
Add-Type -AssemblyName System.Windows.Forms;
Add-Type -AssemblyName System.Drawing
function GetDetails() {
$browser = New-Object System.Windows.Forms.OpenFileDialog;
$browser.Filter = "txt (*.txt)|*.txt";
$browser.InitialDirectory = "E:\";
$browser.Title = "select txt file";
$browserResult = $browser.ShowDialog();
# Combined the if statements
if($browserResult -eq [System.Windows.Forms.DialogResult]::OK -and [string]::IsNullOrWhiteSpace($txtFile) -ne $true) {
$nfoFile = $browser.FileName;
$txtFile = [System.IO.Path]::ChangeExtension($nfoFile, ".dac");
$txtFile = $temp + [System.IO.Path]::GetFileName($txtFile);
$exeArgs = "-f -S `"$txtFile`" -O `"$txtFile`"";
Start-Process $anExe -ArgumentList $exeArgs -Wait;
# The Raw flag should return a string
$result = Get-Content $txtFile -Raw;
$browser.Dispose();
return $result;
}
# No need for else since the if statement returns
return GetFromForm;
}
function GetFromForm(){
$form = New-Object System.Windows.Forms.Form;
$form.Text = 'Adding Arguments'
$form.Size = New-Object System.Drawing.Size(816,600)
$form.StartPosition = 'CenterScreen'
# Added a button
$OKButton = New-Object System.Windows.Forms.Button
$OKButton.Location = New-Object System.Drawing.Point(585,523)
$OKButton.Size = New-Object System.Drawing.Size(75,23)
$OKButton.Text = 'OK'
$OKButton.DialogResult = [System.Windows.Forms.DialogResult]::OK
$form.AcceptButton = $OKButton
$form.Controls.Add($OKButton)
$txtBox = New-Object System.Windows.Forms.TextBox;
$txtBox.Multiline = $true;
$txtBox.AcceptsReturn = $true;
$txtBox.AcceptsTab = $true;
$txtBox.Visible = $true;
$txtBox.Name = "txtName";
$txtBox.Size = New-Object System.Drawing.Size(660,500)
$form.Controls.Add($txtBox);
# Needed to force it to show on top
$form.TopMost = $true
# Select the textbox and activate the form to make it show with focus
$form.Add_Shown({$txtBox.Select(), $form.Activate()})
# Finally show the form and assign the ShowDialog method to a variable (this keeps it from printing out Cancel)
$result = $form.ShowDialog();
# If the user hit the OK button return the text in the textbox
if ($result -eq [System.Windows.Forms.DialogResult]::OK)
{
return $txtBox.Text
}
}
$desc = GetDetails;
cls;
Write-Host $desc;
答案 2 :(得分:0)
实际上,尝试@Eugene Yarmash 的回答我发现我收到了这个错误:
FieldError: Joined field references are not permitted in this query
但我相信迭代 update
仍然比多次保存要快,我希望使用事务也应该加快速度。
因此,对于不提供 bulk_update
的 Django 版本,假设 Eugene 的答案中使用的数据相同,其中 headlines
是 pk -> 标题映射:
from django.db import transaction
with transaction.atomic():
for entry_pk, headline in headlines.items():
Entry.objects.filter(pk=entry_pk).update(headline=headline)