查询耗时太长

时间:2017-07-20 00:12:09

标签: sql postgresql join query-optimization

我正在一个简单的SCADA系统中研究PostgreSQL(最新版本)。问题是涉及我的模式[图1]中最大表的每个查询大约需要2个小时。

Schema

我想要的是获取建筑物的构造函数的PK(edificio),其部门(departamento)记录(表valor中的medicion)过剩的测量值2010年的“气体”或“电气”。所以......

  • variable包含记录的最大值(valmax
  • medicion包含指标(valor
  • 的值

表格大小(以行为单位)。

  • constructura -> 10
  • edificio -> 100
  • departamento -> 50.000
  • variable -> 8
  • medicion_departamento -> 400.000
  • medicion -> 8.000.000

查询

我使用INNER JOINs查询了最小的表格constructoravariable)到最大的表格medicion

SELECT DISTINCT C.id_constructora FROM
constructora C
INNER JOIN variable V ON (V.nombre = 'electricidad' OR V.nombre = 'gas')
INNER JOIN edificio E ON (E.id_constructora = C.id_constructora)
INNER JOIN departamento D ON (E.id_edificio = D.id_edificio)
INNER JOIN medicion M ON (M.id_variable = V.id_variable)
WHERE (
    (M.valor > V.valmax) AND
    EXTRACT(YEAR FROM M.fecha) = 2010
);

EXPLAIN

"HashAggregate  (cost=2343438.58..2343438.68 rows=10 width=4)"
"  Group Key: c.id_constructora"
"  ->  Hash Join  (cost=164536.25..1947605.25 rows=158333333 width=4)"
"        Hash Cond: (e.id_constructora = c.id_constructora)"
"        ->  Hash Join  (cost=4.25..1510.75 rows=50000 width=4)"
"              Hash Cond: (d.id_edificio = e.id_edificio)"
"              ->  Seq Scan on departamento d  (cost=0.00..819.00 rows=50000 width=4)"
"              ->  Hash  (cost=3.00..3.00 rows=100 width=8)"
"                    ->  Seq Scan on edificio e  (cost=0.00..3.00 rows=100 width=8)"
"        ->  Hash  (cost=164136.12..164136.12 rows=31670 width=4)"
"              ->  Nested Loop  (cost=0.00..164136.12 rows=31670 width=4)"
"                    ->  Nested Loop  (cost=0.00..163739.12 rows=3167 width=0)"
"                          Join Filter: ((m.valor > v.valmax) AND (v.id_variable = m.id_variable))"
"                          ->  Seq Scan on medicion m  (cost=0.00..162408.00 rows=38000 width=8)"
"                                Filter: (date_part('year'::text, fecha) = '2010'::double precision)"
"                          ->  Materialize  (cost=0.00..1.13 rows=2 width=24)"
"                                ->  Seq Scan on variable v  (cost=0.00..1.12 rows=2 width=24)"
"                                      Filter: (((nombre)::text = 'electricidad'::text) OR ((nombre)::text = 'gas'::text))"
"                    ->  Materialize  (cost=0.00..1.15 rows=10 width=4)"
"                          ->  Seq Scan on constructora c  (cost=0.00..1.10 rows=10 width=4)"

我的问题是我能做些什么来减少 - 显着地减少查询的执行时间?

2 个答案:

答案 0 :(得分:1)

or中的join是性能杀手。也许这就是你想要的:

SELECT DISTINCT C.id_constructora
FROM constructora C INNER JOIN
     edificio E
     ON E.id_constructora = C.id_constructora INNER JOIN
     departamento D
     ON E.id_edificio = D.id_edificio LEFT JOIN
     variable ve
     ON ve.nombre = 'electricidad' AND M.valor > ve.valmax LEFT JOIN
     variable vg
     ON vg.nombre = 'gas' LEFT JOIN
     medicion me
     ON me.id_variable = ve.id_variable AND
        EXTRACT(YEAR FROM me.fecha) = 2010 AND
         me.valor > ve.valmax LEFT JOIN
     medicion mg
     ON mg.id_variable = vg.id_variable AND
        EXTRACT(YEAR FROM mg.fecha) = 2010 AND
        mg.valor > vg.valmax
WHERE EXTRACT(YEAR FROM M.fecha) = 2010 AND
      (me.id_variable IS NOT NULL OR mg.id_variable IS NOT NULL);

我还怀疑您希望mg.id_departamento = d.id_departamentome.id_departamento = d.id_departamento条件中的onvariable(nombre, valor)加入条件。

这假设您在连接键上有适当的索引(特别是主键)。此外,您需要medicion(id_variable, fecha, valor)id_departamento上的索引。如果您使用 import os from flask import Flask, render_template, request, redirect, url_for, send_from_directory from werkzeug.utils import secure_filename app = Flask(__name__) APP_ROOT = os.path.dirname(os.path.abspath(__file__)) UPLOAD_FOLDER = os.path.join(APP_ROOT, 'static/uploads') app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER app.config['ALLOWED_EXTENSIONS'] = set(['png', 'jpg', 'jpeg','gif']) def allowed_file(filename): return '.' in filename and \ filename.rsplit('.', 1)[1] in app.config['ALLOWED_EXTENSIONS'] @app.route('/') def index(): return render_template('index.html') @app.route('/upload', methods=['POST']) def upload(): file = request.files['file'] if file and allowed_file(file.filename): filename = secure_filename(file.filename) file.save(os.path.join(app.config['UPLOAD_FOLDER'], filename)) return redirect(url_for('uploaded_file',filename=filename)) @app.route('/uploads/<filename>') def uploaded_file(filename): return send_from_directory(app.config['UPLOAD_FOLDER'],filename) if __name__ == '__main__': app.run(port=4555) ,它应该是索引中的第一个键。

答案 1 :(得分:1)

似乎(根据架构)您的查询应该是:

SELECT DISTINCT C.id_constructora FROM
constructora C
  INNER JOIN edificio E ON (E.id_constructora = C.id_constructora)
  INNER JOIN departamento D ON (E.id_edificio = D.id_edificio)
  -- Changes here
  INNER JOIN medicion M ON (D.id_departamento = M.id_departamento)
  INNER JOIN variable V ON (M.id_variable = V.id_variable AND (V.nombre = 'electricidad' OR V.nombre = 'gas'))
WHERE (
    (M.valor > V.valmax) AND
    EXTRACT(YEAR FROM M.fecha) = 2010
);

因为在原始查询中,您有medicionvariable表与所有其他表的交叉连接。

此外,你应该有像

这样的功能索引
create index idx_medicion_fecha_year on medicion(EXTRACT(YEAR FROM M.fecha));

用于EXTRACT(YEAR FROM fecha) = 2010条件。