Neo4j笛卡尔积限制-查询优化

时间:2018-11-08 13:03:01

标签: graph neo4j cypher

给出以下模式

enter image description here

我需要获取一个Collection列表,每个列表具有用户指定的一组ProductCard(这是Product匹配项)匹配条件:

  • 收藏类型
  • 1..4种产品类型(对于每个选定的ProductType,一组中必须有一个ProductCard。)
  • 成套价格

我从这样的查询开始

MATCH (c:Collection {type: 'selected_collection_type'})<-[:FROM_COLLECTION]-(:Product)-[:OF_TYPE]->(pt1:ProductType {title: '1st product type'}), (c)<-[:FROM_COLLECTION]-(:Product)-[:OF_TYPE]->(pt2:ProductType {title: '2nd product type'}),(c)<-[:FROM_COLLECTION]-(:Product)-[:OF_TYPE]->(pt3:ProductType {title: '3rd product type'}), (c)<-[:FROM_COLLECTION]-(:Product)-[:OF_TYPE]->(pt4:ProductType {title: '4th product type'})
CALL apoc.cypher.run('
WITH {c} AS c, {pt1} AS pt1, {pt2} AS pt2, {pt3} AS pt3, {pt4} AS pt4
    MATCH (pt1)<-[:OF_TYPE]-(p1:Product)-[:FROM_COLLECTION]->(c), (pt2)<-[:OF_TYPE]-(p2:Product)-[:FROM_COLLECTION]->(c), (pt3)<-[:OF_TYPE]-(p3:Product)-[:FROM_COLLECTION]->(c), (pt4)<-[:OF_TYPE]-(p4:Product)-[:FROM_COLLECTION]->(c), (pc1:ProductCard)-[:VARIANT_OF]->(p1), (pc2:ProductCard)-[:VARIANT_OF]->(p2), (pc3:ProductCard)-[:VARIANT_OF]->(p3), (pc4:ProductCard)-[:VARIANT_OF]->(p4)
    WHERE (pc1.price + pc2.price + pc3.price + pc4.price < price_margin_for_set)
    RETURN pc1, pc2, pc3, pc4, (p1.weight + p2.weight + p3.weight + p4.weight) AS sweight ORDER BY sweight DESC LIMIT 1
', {c:c, pt1:pt1, pt2:pt2, pt3:pt3, pt4:pt4}) YIELD value
RETURN c, value ORDER BY value.sweight DESC LIMIT 8;

,它最多可用于3种选定的产品类型,但是当我添加第4种产品类型时,它的速度将大大降低。这里的问题是,我只需要从子查询中返回1套,但是从所有产品变体(Product可以有1 ..〜10 ProductCard)计算出的笛卡尔积对于4种类型来说就很大。

如何优化此查询的性能/减少从子查询返回1套匹配的价格标准所需的变化计数?

这里是解释 explain

编辑: 稍微改变了查询

WITH ['Product Type 1', 'Product Type 2', 'Product Type 3', 'Product Type 4'] as types
MATCH (c:Collection)<-[:FROM_COLLECTION]-(:Product)-[:OF_TYPE]->(pt:ProductType)
WHERE pt.title in types AND c.type = 'collection type'
WITH c, size(types) as inputCnt, count(DISTINCT pt) as cnt
WHERE cnt = inputCnt
CALL apoc.cypher.run('
WITH {c} AS c 
MATCH (c)<-[:FROM_COLLECTION]-(p1:Product)-[:OF_TYPE]->(:ProductType {title: "Product Type 1"})
MATCH (pc1:ProductCard)-[:VARIANT_OF]->(p1)
MATCH (c)<-[:FROM_COLLECTION]-(p2:Product)-[:OF_TYPE]->(:ProductType {title: "Product Type 2"})
MATCH (pc2:ProductCard)-[:VARIANT_OF]->(p2)
MATCH (c)<-[:FROM_COLLECTION]-(p3:Product)-[:OF_TYPE]->(:ProductType {title: "Product Type 3"})
MATCH (pc3:ProductCard)-[:VARIANT_OF]->(p3)
MATCH (c)<-[:FROM_COLLECTION]-(p4:Product)-[:OF_TYPE]->(:ProductType {title: "Product Type 4"})
MATCH (pc4:ProductCard)-[:VARIANT_OF]->(p4)
WHERE (pc1.price + pc2.price + pc3.price + pc4.price < 1000)
RETURN pc1, pc2, pc3, pc4, (p1.weight + p2.weight + p3.weight + p4.weight) AS sweight ORDER BY sweight DESC LIMIT 1
', {c:c}) YIELD value
RETURN DISTINCT c, value LIMIT 8;

说明

whole query

说明子查询

subquery

1 个答案:

答案 0 :(得分:0)

打破笛卡尔积的最简单方法是添加逻辑,以使一组明显胜过其他。例子

WHERE p1.value > p2.value > p3.value > p4.value

但是,此查询的性质仍然会导致许多集合,而这些集合的细微变化很小。我将只返回WHERE p1.value < 1000 WITH COLLECT(p1) as possible,并处理其余的客户端,以最大程度地减少数据传输。 (如果确实需要,可以在收集值时按值对p1进行排序,然后对最小的3求和,然后过滤出值大于1000-smallest_set的值,但感觉有些过头了)