我需要计算一个作曲家的音乐片段每十年进行多少次,然后只返回每十年表现最多的一首。
除了过滤除了每十年最高计数之外的所有内容,此cypher会执行所有操作。
match (c:Composer)-[:CREATED_BY]-(w:Work)<-[*..2]-(prog:Program)
WHERE c.lastname =~ '(?i).*stravinsky.*'
WITH w.title AS Title, prog.title AS Program, LEFT(prog.date, 3)+"0" AS Decade
RETURN Decade, Title, COUNT(Program) AS Total
ORDER BY Decade, Total DESC, Title
我一直在敲打几个小时,但是找不到解决办法。
答案 0 :(得分:4)
这似乎回归了你正在寻找的东西,但它可能会得到改善。
MATCH (c:Composer)-[r:CREATED_BY]-(w:Work)<-[*..2]-(prog:Program)
WHERE c.lastname =~ '(?i).*stravinsky.*'
WITH LEFT(prog.date, 3)+"0" AS Decade, w.title AS Title, COUNT(prog.title) AS Total
ORDER BY Decade, Total DESC, Title
RETURN Decade, HEAD(COLLECT(Total)) AS Total, HEAD(COLLECT(Title)) AS Title
ORDER BY Decade
它只返回每个十年的一个结果,但没有考虑到关系,所以对我来说感觉有点不完整。如果我想出一些好的东西,我会考虑如何做到并编辑。
我将此字符串与http://graphgen.neoxygen.io一起用于在本地生成示例数据。
(c:Composer {firstname: firstName, lastname: lastName} *10)<-[:CREATED_BY *n..1]-(w:Work {title: progLanguage} *75)<-[:PERFORMED *n..1]-(prog:Program {title: catchPhrase, date: date} *400)
VICTORY EDIT
这是上述查询的原始版本,当存在关联时将显示多个Works。
MATCH (c:Composer)-[r:CREATED_BY]-(w:Work)<-[*..2]-(prog:Program)
WHERE c.lastname =~ '(?i).*stravinsky.*'
WITH LEFT(prog.date, 3)+"0" AS Decade, w.title AS Title, COUNT(prog.title) AS Total
ORDER BY Decade, Total DESC, Title
WITH Decade, Title, Total, HEAD(COLLECT(Total)) AS PerformedTotal
WITH Decade, [title in COLLECT(Title) WHERE Total = PerformedTotal] as Title, Total, PerformedTotal
ORDER BY PerformedTotal DESC
return Decade, HEAD(COLLECT(PerformedTotal)) as Totals, HEAD(COLLECT(Title)) as Titles
ORDER BY Decade
我觉得应该可以重构它,但我似乎无法简化它。
我对写这个答案的过程有很多笔记。即使它不是你正在寻找的东西,这里也是TLDR,因为它仍然很有趣。
<-[*..2]-
更改为其他任何内容,则会导致查询崩溃。如果您将Cypher查询规划器设置为Cypher 2.1
,那么如果第一行是MATCH (c:Composer)-[r:CREATED_BY]-(w)<-[r2:REL_TYPE]-(prog)
,性能最佳。仅在第一个节点上使用标签来帮助WHERE
完成其工作。始终始终使用node和rel标识符。[title in COLLECT(Title) WHERE Total = PerformedTotal]
正在使用同一行中的变量。如果我将它们拉出来,它就会崩溃。更令人惊讶的是,我无法按照我期望的方式进行重构。我期望这样做,但不能:
MATCH (c:Composer)-[r:CREATED_BY]-(w:Work)<-[*..2]-(prog:Program)
WHERE c.lastname =~ '(?i).*stravinsky.*'
WITH LEFT(prog.date, 3)+"0" AS Decade, w.title AS Title, COUNT(prog.title) AS Total
ORDER BY Decade, Total DESC, Title
WITH Decade, [title in COLLECT(Title) WHERE Total = HEAD(COLLECT(Total))] as Title, Total, HEAD(COLLECT(Total)) AS PerformedTotal
ORDER BY PerformedTotal DESC
return Decade, HEAD(COLLECT(PerformedTotal)) as Totals, HEAD(COLLECT(Title)) as Titles
ORDER BY Decade
另一个编辑:如何可能加快速度
如果你的查询可能有一些潜在的路径,但是你想避免使用[*..2]
,你可以通过详细说明尝试查找时应该采取的路径来加快速度。比赛。这是否更快真的取决于它可以采取多少分支,这将是死路一条。如果你只给它两个或三个路径,那么它可以完全忽略其他六个关系,它可能会抵消过滤和后来发生的事情。当然,如果路径足够复杂,这可能比它的价值更麻烦。
你应该将它弹出到neo4j-shell并添加PROFILE
,在末尾添加一个分号,然后查看数据库访问次数,以确定哪种方法最适合你的数据集。
MATCH (c:Composer)-[r:CREATED_BY]-(w)
WHERE c.lastname =~ '(?i).*Denesik.*'
OPTIONAL MATCH (w)-[r2:CONNECTED_TO]-(this_node)<-[r3:ONE_MORE]-(prog1)
OPTIONAL MATCH (w)<-[r4:PERFORMED]-(prog2)
OPTIONAL MATCH (w)-[r5:THIS_REL]->(this_node)-[r6:AGAIN_WITH_THE_RELS]->(prog3)
WITH FILTER(program in [prog1, prog2, prog3] WHERE program IS NOT NULL) AS progarray, w.title AS Title
UNWIND(progarray) as prog
WITH LEFT(prog.date, 3)+"0" AS Decade, COUNT(prog.title) AS Total, Title
ORDER BY Decade, Total DESC, Title
WITH Decade, Title, Total, HEAD(COLLECT(Total)) AS PerformedTotal
WITH Decade, [title in COLLECT(Title) WHERE Total = PerformedTotal] as Title, Total, PerformedTotal
ORDER BY PerformedTotal DESC
return Decade, HEAD(COLLECT(PerformedTotal)) as Totals, HEAD(COLLECT(Title)) as Titles
ORDER BY Decade;
最棘手的部分是,如果我们重复使用prog
变量,它会将每个OPTIONAL MATCH的结果拖到下一个,主要是尝试过滤,我们赢了&#39 ; t获得完全独立的路径。 (为什么我们现在能够重用w
超出我的范围......)但是,这很好。我们获取结果,将它们放入数组中,过滤空结果,然后将其展开回包含所有有效结果的单个变量。在那之后,我们继续正常。
在我的测试中,使用正确的数据集似乎可以显着提高速度。 YMMV。