如何提高neo4j查询中两个链式聚合的性能

时间:2016-08-16 16:25:43

标签: neo4j

假设我有一个性质的neo4j图:

enter image description here

create (SK:Author {name:'Stephen King'}), (JK:Author {name:'J.K. Rowling'}), (DS:Author {name:'Dr. Seuss'}), (TS:Book {name:'The Stand'}), (HP:Book {name:'Harry Potter'}), (CH:Book {name:'Cat in the Hat'}), (SHINING:Book {name:'The Shining'}), (PAF:Genre {name:'Post-Apocalyptic fiction'}), (F:Genre {name:'Fantasy'}), (C:Genre {name:'Childrens'}), (HORROR:Genre {name:'Horror'}), (SK)<-[:WRITTEN_BY]-(TS)-[:CATEGORIZED_AS]->(PAF), (JK)<-[:WRITTEN_BY]-(HP)-[:CATEGORIZED_AS]->(F), (DS)<-[:WRITTEN_BY]-(CH)-[:CATEGORIZED_AS]->(C), (SK)<-[:WRITTEN_BY]-(SHINING)-[:CATEGORIZED_AS]->(HORROR)

Neo4j控制台链接:http://console.neo4j.org/r/2d69kq

我有大约53,000个作者节点,600万个书籍节点和9,000个流派节点。

对于像这样的查询:

match (b:Book)-[:WRITTEN_BY]->(a:Author)
where a.name in ['Stephen King', 'J.K. Rowling']
with a, collect(b) as bs
unwind bs as book
match (g:Genre)<-[r:CATEGORIZED_AS]-(book)
where id(g) in [13, 14, 15, 16]
with a, count(distinct book) as book_count_author, collect(book) as bs
unwind bs as book
match (g:Genre)<-[r:CATEGORIZED_AS]-(book)
where id(g) in [13, 14, 15, 16]
return a.name, g.name, count(distinct book) as book_count_genre, book_count_author

完成大约需要12秒。我试图用几种不同的方式重写查询并使用索引提示,但无法弄清楚任何方法使它更快。有什么想法吗?显然这个例子是简化的,但我确实有适当属性的索引。

以下是链式聚合的示例结果: enter image description here

我需要两个聚合。第一个是作者的书籍数量,受到第二场比赛中指定类型的限制。第二个计数是每个类型的每个作者的书籍,再次受到相同类型的限制。

2 个答案:

答案 0 :(得分:0)

假设您有一个索引(或存在约束):Author.name,您可以使用它:

match (book:Book)-[:WRITTEN_BY]->(a:Author)
where a.name in ['Stephen King', 'J.K. Rowling']
with a, book
match (g:Genre)<-[:CATEGORIZED_AS]-(book)
return a, g, size((g)<-[:CATEGORIZED_AS]-()) as book_count_genre, count(book) as book_count_author

收集和放卷可能很昂贵,尽量避免使用它。请注意,您可以找到模式的大小,以获取模式的计数,否则计数将无法通过给定列的节点上的计数来检索。

编辑

针对明确要求的修改查询

book_count_genre - 任何作者拥有该类型的图书数量

book_count_author - 由给定作者撰写的具有该类型的书籍数量

答案 1 :(得分:0)

@InverseFalcon说的是什么。

您可以进一步优化它:

MATCH (g:Genre)<-[r:CATEGORIZED_AS]-(book:Book)-[:WRITTEN_BY]->(a:Author) 
WHERE a.name in ['Stephen King', 'J.K. Rowling'] 
  AND g.name IN ['Post-Apocalyptic fiction','Childrens','Horror','Fantasy'] 
WITH a, g, count(distinct book) as  book_count_author_genre 
RETURN a.name, collect({ genre: g.name, count: book_count_author_genre}), 
       sum(book_count_author_genre) as book_count_author

您可能需要为作者和流派使用索引提示

如果你使用id进行查找,它会更快。