我正在一个简单的SCADA系统中研究PostgreSQL(最新版本)。问题是涉及我的模式[图1]中最大表的每个查询大约需要2个小时。
我想要的是获取建筑物的构造函数的PK(edificio
),其部门(departamento
)记录(表valor
中的medicion
)过剩的测量值2010年的“气体”或“电气”。所以......
variable
包含记录的最大值(valmax
)medicion
包含指标(valor
)constructura -> 10
edificio -> 100
departamento -> 50.000
variable -> 8
medicion_departamento -> 400.000
medicion -> 8.000.000
我使用INNER JOINs
查询了最小的表格constructora
和variable
)到最大的表格medicion
。
SELECT DISTINCT C.id_constructora FROM
constructora C
INNER JOIN variable V ON (V.nombre = 'electricidad' OR V.nombre = 'gas')
INNER JOIN edificio E ON (E.id_constructora = C.id_constructora)
INNER JOIN departamento D ON (E.id_edificio = D.id_edificio)
INNER JOIN medicion M ON (M.id_variable = V.id_variable)
WHERE (
(M.valor > V.valmax) AND
EXTRACT(YEAR FROM M.fecha) = 2010
);
"HashAggregate (cost=2343438.58..2343438.68 rows=10 width=4)"
" Group Key: c.id_constructora"
" -> Hash Join (cost=164536.25..1947605.25 rows=158333333 width=4)"
" Hash Cond: (e.id_constructora = c.id_constructora)"
" -> Hash Join (cost=4.25..1510.75 rows=50000 width=4)"
" Hash Cond: (d.id_edificio = e.id_edificio)"
" -> Seq Scan on departamento d (cost=0.00..819.00 rows=50000 width=4)"
" -> Hash (cost=3.00..3.00 rows=100 width=8)"
" -> Seq Scan on edificio e (cost=0.00..3.00 rows=100 width=8)"
" -> Hash (cost=164136.12..164136.12 rows=31670 width=4)"
" -> Nested Loop (cost=0.00..164136.12 rows=31670 width=4)"
" -> Nested Loop (cost=0.00..163739.12 rows=3167 width=0)"
" Join Filter: ((m.valor > v.valmax) AND (v.id_variable = m.id_variable))"
" -> Seq Scan on medicion m (cost=0.00..162408.00 rows=38000 width=8)"
" Filter: (date_part('year'::text, fecha) = '2010'::double precision)"
" -> Materialize (cost=0.00..1.13 rows=2 width=24)"
" -> Seq Scan on variable v (cost=0.00..1.12 rows=2 width=24)"
" Filter: (((nombre)::text = 'electricidad'::text) OR ((nombre)::text = 'gas'::text))"
" -> Materialize (cost=0.00..1.15 rows=10 width=4)"
" -> Seq Scan on constructora c (cost=0.00..1.10 rows=10 width=4)"
我的问题是我能做些什么来减少 - 显着地减少查询的执行时间?
答案 0 :(得分:1)
or
中的join
是性能杀手。也许这就是你想要的:
SELECT DISTINCT C.id_constructora
FROM constructora C INNER JOIN
edificio E
ON E.id_constructora = C.id_constructora INNER JOIN
departamento D
ON E.id_edificio = D.id_edificio LEFT JOIN
variable ve
ON ve.nombre = 'electricidad' AND M.valor > ve.valmax LEFT JOIN
variable vg
ON vg.nombre = 'gas' LEFT JOIN
medicion me
ON me.id_variable = ve.id_variable AND
EXTRACT(YEAR FROM me.fecha) = 2010 AND
me.valor > ve.valmax LEFT JOIN
medicion mg
ON mg.id_variable = vg.id_variable AND
EXTRACT(YEAR FROM mg.fecha) = 2010 AND
mg.valor > vg.valmax
WHERE EXTRACT(YEAR FROM M.fecha) = 2010 AND
(me.id_variable IS NOT NULL OR mg.id_variable IS NOT NULL);
我还怀疑您希望mg.id_departamento = d.id_departamento
和me.id_departamento = d.id_departamento
条件中的on
和variable(nombre, valor)
加入条件。
这假设您在连接键上有适当的索引(特别是主键)。此外,您需要medicion(id_variable, fecha, valor)
和id_departamento
上的索引。如果您使用 import os
from flask import Flask, render_template, request, redirect, url_for,
send_from_directory
from werkzeug.utils import secure_filename
app = Flask(__name__)
APP_ROOT = os.path.dirname(os.path.abspath(__file__))
UPLOAD_FOLDER = os.path.join(APP_ROOT, 'static/uploads')
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER
app.config['ALLOWED_EXTENSIONS'] = set(['png', 'jpg', 'jpeg','gif'])
def allowed_file(filename):
return '.' in filename and \
filename.rsplit('.', 1)[1] in app.config['ALLOWED_EXTENSIONS']
@app.route('/')
def index():
return render_template('index.html')
@app.route('/upload', methods=['POST'])
def upload():
file = request.files['file']
if file and allowed_file(file.filename):
filename = secure_filename(file.filename)
file.save(os.path.join(app.config['UPLOAD_FOLDER'], filename))
return redirect(url_for('uploaded_file',filename=filename))
@app.route('/uploads/<filename>')
def uploaded_file(filename):
return send_from_directory(app.config['UPLOAD_FOLDER'],filename)
if __name__ == '__main__':
app.run(port=4555)
,它应该是索引中的第一个键。
答案 1 :(得分:1)
似乎(根据架构)您的查询应该是:
SELECT DISTINCT C.id_constructora FROM
constructora C
INNER JOIN edificio E ON (E.id_constructora = C.id_constructora)
INNER JOIN departamento D ON (E.id_edificio = D.id_edificio)
-- Changes here
INNER JOIN medicion M ON (D.id_departamento = M.id_departamento)
INNER JOIN variable V ON (M.id_variable = V.id_variable AND (V.nombre = 'electricidad' OR V.nombre = 'gas'))
WHERE (
(M.valor > V.valmax) AND
EXTRACT(YEAR FROM M.fecha) = 2010
);
因为在原始查询中,您有medicion
和variable
表与所有其他表的交叉连接。
此外,你应该有像
这样的功能索引create index idx_medicion_fecha_year on medicion(EXTRACT(YEAR FROM M.fecha));
用于EXTRACT(YEAR FROM fecha) = 2010
条件。