假设我有一个性质的neo4j图:
create (SK:Author {name:'Stephen King'}), (JK:Author {name:'J.K. Rowling'}), (DS:Author {name:'Dr. Seuss'}), (TS:Book {name:'The Stand'}), (HP:Book {name:'Harry Potter'}), (CH:Book {name:'Cat in the Hat'}), (SHINING:Book {name:'The Shining'}), (PAF:Genre {name:'Post-Apocalyptic fiction'}), (F:Genre {name:'Fantasy'}), (C:Genre {name:'Childrens'}), (HORROR:Genre {name:'Horror'}), (SK)<-[:WRITTEN_BY]-(TS)-[:CATEGORIZED_AS]->(PAF), (JK)<-[:WRITTEN_BY]-(HP)-[:CATEGORIZED_AS]->(F), (DS)<-[:WRITTEN_BY]-(CH)-[:CATEGORIZED_AS]->(C), (SK)<-[:WRITTEN_BY]-(SHINING)-[:CATEGORIZED_AS]->(HORROR)
Neo4j控制台链接:http://console.neo4j.org/r/2d69kq
我有大约53,000个作者节点,600万个书籍节点和9,000个流派节点。
对于像这样的查询:
match (b:Book)-[:WRITTEN_BY]->(a:Author)
where a.name in ['Stephen King', 'J.K. Rowling']
with a, collect(b) as bs
unwind bs as book
match (g:Genre)<-[r:CATEGORIZED_AS]-(book)
where id(g) in [13, 14, 15, 16]
with a, count(distinct book) as book_count_author, collect(book) as bs
unwind bs as book
match (g:Genre)<-[r:CATEGORIZED_AS]-(book)
where id(g) in [13, 14, 15, 16]
return a.name, g.name, count(distinct book) as book_count_genre, book_count_author
完成大约需要12秒。我试图用几种不同的方式重写查询并使用索引提示,但无法弄清楚任何方法使它更快。有什么想法吗?显然这个例子是简化的,但我确实有适当属性的索引。
我需要两个聚合。第一个是作者的书籍数量,受到第二场比赛中指定类型的限制。第二个计数是每个类型的每个作者的书籍,再次受到相同类型的限制。
答案 0 :(得分:0)
假设您有一个索引(或存在约束):Author.name,您可以使用它:
match (book:Book)-[:WRITTEN_BY]->(a:Author)
where a.name in ['Stephen King', 'J.K. Rowling']
with a, book
match (g:Genre)<-[:CATEGORIZED_AS]-(book)
return a, g, size((g)<-[:CATEGORIZED_AS]-()) as book_count_genre, count(book) as book_count_author
收集和放卷可能很昂贵,尽量避免使用它。请注意,您可以找到模式的大小,以获取模式的计数,否则计数将无法通过给定列的节点上的计数来检索。
编辑
针对明确要求的修改查询
book_count_genre - 任何作者拥有该类型的图书数量
book_count_author - 由给定作者撰写的具有该类型的书籍数量
答案 1 :(得分:0)
@InverseFalcon说的是什么。
您可以进一步优化它:
MATCH (g:Genre)<-[r:CATEGORIZED_AS]-(book:Book)-[:WRITTEN_BY]->(a:Author)
WHERE a.name in ['Stephen King', 'J.K. Rowling']
AND g.name IN ['Post-Apocalyptic fiction','Childrens','Horror','Fantasy']
WITH a, g, count(distinct book) as book_count_author_genre
RETURN a.name, collect({ genre: g.name, count: book_count_author_genre}),
sum(book_count_author_genre) as book_count_author
您可能需要为作者和流派使用索引提示
如果你使用id进行查找,它会更快。