我在SQLAlchemy中设置了一个运行速度有点慢的查询,试图优化它。由于未知原因,结果使用隐式交叉连接,这种连接速度明显较慢,并且完全出错。我已经对表名和参数进行了匿名处理,但是没有进行任何更改。有谁知道这是从哪里来的?
为了便于查找:新旧SQL的差异在于新的SQL具有更长的SELECT,并在任何JOIN之前提及WHERE中的所有三个表。
原始代码:
cust_name = u'Bob'
proj_name = u'job1'
item_color = u'blue'
query = (db.session.query(Item.name)
.join(Project, Customer)
.filter(Customer.name == cust_name,
Project.name == proj_name)
.distinct(Item.name))
# some conditionals determining last filter, resolving to this one:
query = query.filter(Item.color == item_color)
result = query.all()
由flask_sqlalchemy.get_debug_queries记录的原始发出的SQL:
QUERY: SELECT DISTINCT ON (items.name) items.name AS items_name
FROM items JOIN projects ON projects.id = items._project_id JOIN customers ON customers.id = projects._customer_id
WHERE customers.name = %(name_1)s AND projects.name = %(name_2)s AND items.color = %(color_1)s
Parameters: `{'name_2': u'job1', 'state_1': u'blue', 'name_1': u'Bob'}
新代码:
cust_name = u'Bob'
proj_name = u'job1'
item_color = u'blue'
query = (db.session.query(Item)
.options(Load(Item).load_only('name', 'color'),
joinedload(Item.project, innerjoin=True).load_only('name').
joinedload(Project.customer, innerjoin=True).load_only('name'))
.filter(Customer.name == cust_name,
Project.name == proj_name)
.distinct(Item.name))
# some conditionals determining last filter, resolving to this one:
query = query.filter(Item.color == item_color)
result = query.all()
flask_sqlalchemy.get_debug_queries:
记录的新发出的SQLQUERY: SELECT DISTINCT ON (items.nygc_id) items.id AS items_id, items.name AS items_name, items.color AS items_color, items._project_id AS items__project_id, customers_1.id AS customers_1_id, customers_1.name AS customers_1_name, projects_1.id AS projects_1_id, projects_1.name AS projects_1_name
FROM customers, projects, items JOIN projects AS projects_1 ON projects_1.id = items._project_id JOIN customers AS customers_1 ON customers_1.id = projects_1._customer_id
WHERE customers.name = %(name_1)s AND projects.name = %(name_2)s AND items.color = %(color_1)s
Parameters: `{'state_1': u'blue', 'name_2': u'job1', 'name_1': u'Bob'}
如果重要,底层数据库是PostgreSQL。
查询的原始意图只需要Item.name
。优化尝试看起来不太可能实际上有用,我想的时间越长,但我仍然想知道交叉连接的来源,以防它再次发生添加joinedload
,load_only
的地方等等实际上会有所帮助。
答案 0 :(得分:3)
这是因为joinedload
与join
不同。 joinedload
ed实体实际上是匿名的,您应用的后续过滤器会引用同一个表的不同实例,因此customers
和projects
会加入两次。
您应该像以前一样做join
,但使用contains_eager
使您的加入看起来像joinedload
。
query = (session.query(Item)
.join(Item.project)
.join(Project.customer)
.options(Load(Item).load_only('name', 'color'),
Load(Item).contains_eager("project").load_only('name'),
Load(Item).contains_eager("project").contains_eager("customer").load_only('name'))
.filter(Customer.name == cust_name,
Project.name == proj_name)
.distinct(Item.name))
这为您提供了查询
SELECT DISTINCT ON (items.name) customers.id AS customers_id, customers.name AS customers_name, projects.id AS projects_id, projects.name AS projects_name, items.id AS items_id, items.name AS items_name, items.color AS items_color
FROM items JOIN projects ON projects.id = items._project_id JOIN customers ON customers.id = projects._customer_id
WHERE customers.name = %(name_1)s AND projects.name = %(name_2)s AND items.color = %(color_1)s
答案 1 :(得分:1)
不确定您要实现的目标,但看起来您正在尝试在表之间进行内部联接,并且只选择特定的列。
所以我认为你需要做一些事情:
cust_name = u'Bob'
proj_name = u'job1'
item_color = u'blue'
query = (db.session.query(Item.name)
.join(Project, Customer)
.filter(Customer.name == cust_name,
Project.name == proj_name)
.distinct(Item.name))
# Select the loaded columns
query = query.add_columns(Item.name, Item.color, Project.name, Customer.name)
# some conditionals determining last filter, resolving to this one:
query = query.filter(Item.color == item_color)
result = query.all()
FWIW我认为这不会给您的查询带来任何重大优化。