SQLAlchemy无缘无故地发出交叉连接

时间:2016-09-01 17:50:49

标签: python postgresql sqlalchemy

我在SQLAlchemy中设置了一个运行速度有点慢的查询,试图优化它。由于未知原因,结果使用隐式交叉连接,这种连接速度明显较慢,并且完全出错。我已经对表名和参数进行了匿名处理,但是没有进行任何更改。有谁知道这是从哪里来的?

为了便于查找:新旧SQL的差异在于新的SQL具有更长的SELECT,并在任何JOIN之前提及WHERE中的所有三个表。

原始代码:

cust_name = u'Bob'
proj_name = u'job1'
item_color = u'blue'
query = (db.session.query(Item.name)
                       .join(Project, Customer)
                       .filter(Customer.name == cust_name,
                               Project.name == proj_name)
                       .distinct(Item.name))

# some conditionals determining last filter, resolving to this one:
query = query.filter(Item.color == item_color)

result = query.all()

由flask_sqlalchemy.get_debug_queries记录的原始发出的SQL:

QUERY: SELECT DISTINCT ON (items.name) items.name AS items_name
FROM items JOIN projects ON projects.id = items._project_id JOIN customers ON customers.id = projects._customer_id
WHERE customers.name = %(name_1)s AND projects.name = %(name_2)s AND items.color = %(color_1)s
Parameters: `{'name_2': u'job1', 'state_1': u'blue', 'name_1': u'Bob'}

新代码:

cust_name = u'Bob'
proj_name = u'job1'
item_color = u'blue'
query = (db.session.query(Item)
                     .options(Load(Item).load_only('name', 'color'),
                                joinedload(Item.project, innerjoin=True).load_only('name').
                                joinedload(Project.customer, innerjoin=True).load_only('name'))
                     .filter(Customer.name == cust_name,
                                 Project.name == proj_name)
                     .distinct(Item.name))

# some conditionals determining last filter, resolving to this one:
query = query.filter(Item.color == item_color)

result = query.all()

flask_sqlalchemy.get_debug_queries:

记录的新发出的SQL
QUERY: SELECT DISTINCT ON (items.nygc_id) items.id AS items_id, items.name AS items_name, items.color AS items_color, items._project_id AS items__project_id, customers_1.id AS customers_1_id, customers_1.name AS customers_1_name, projects_1.id AS projects_1_id, projects_1.name AS projects_1_name
FROM customers, projects, items JOIN projects AS projects_1 ON projects_1.id = items._project_id JOIN customers AS customers_1 ON customers_1.id = projects_1._customer_id
WHERE customers.name = %(name_1)s AND projects.name = %(name_2)s AND items.color = %(color_1)s
Parameters: `{'state_1': u'blue', 'name_2': u'job1', 'name_1': u'Bob'}

如果重要,底层数据库是PostgreSQL。

查询的原始意图只需要Item.name。优化尝试看起来不太可能实际上有用,我想的时间越长,但我仍然想知道交叉连接的来源,以防它再次发生添加joinedloadload_only的地方等等实际上会有所帮助。

2 个答案:

答案 0 :(得分:3)

这是因为joinedloadjoin不同。 joinedload ed实体实际上是匿名的,您应用的后续过滤器会引用同一个表的不同实例,因此customersprojects会加入两次。

您应该像以前一样做join,但使用contains_eager使您的加入看起来像joinedload

query = (session.query(Item)
                .join(Item.project)
                .join(Project.customer)
                .options(Load(Item).load_only('name', 'color'),
                         Load(Item).contains_eager("project").load_only('name'),
                         Load(Item).contains_eager("project").contains_eager("customer").load_only('name'))
                .filter(Customer.name == cust_name,
                        Project.name == proj_name)
                .distinct(Item.name))

这为您提供了查询

SELECT DISTINCT ON (items.name) customers.id AS customers_id, customers.name AS customers_name, projects.id AS projects_id, projects.name AS projects_name, items.id AS items_id, items.name AS items_name, items.color AS items_color 
FROM items JOIN projects ON projects.id = items._project_id JOIN customers ON customers.id = projects._customer_id 
WHERE customers.name = %(name_1)s AND projects.name = %(name_2)s AND items.color = %(color_1)s

答案 1 :(得分:1)

不确定您要实现的目标,但看起来您正在尝试在表之间进行内部联接,并且只选择特定的列。

所以我认为你需要做一些事情:

cust_name = u'Bob'
proj_name = u'job1'
item_color = u'blue'
query = (db.session.query(Item.name)
                       .join(Project, Customer)
                       .filter(Customer.name == cust_name,
                               Project.name == proj_name)
                       .distinct(Item.name))

# Select the loaded columns
query = query.add_columns(Item.name, Item.color, Project.name, Customer.name)

# some conditionals determining last filter, resolving to this one:
query = query.filter(Item.color == item_color)

result = query.all()

FWIW我认为这不会给您的查询带来任何重大优化。