在阅读https://markorodriguez.com/2011/09/22/a-graph-based-movie-recommender-engine/之后,我目前正在使用MovieLens 20m数据集进行电影推荐。节点电影通过hashare关系连接到Genre,节点电影通过hasRating关系连接到User。我正在尝试检索与查询(例如玩具总动员)具有最高共同评价(共同评价> 3.0)的所有电影,这些查询与Toy Story共享所有流派。这是我的Cypher查询:
MATCH (inputMovie:Movie {movieId: 1})-[r:hasGenre]-(h:Genre)
WITH inputMovie, COLLECT (h) as inputGenres
MATCH (inputMovie)<-[r:hasRating]-(User)-[o:hasRating]->(movie)-[:hasGenre]->(genre)
WITH inputGenres, r, o, movie, COLLECT(genre) AS genres
WHERE ALL(h in inputGenres where h in genres) and (r.rating>3 and o.rating>3)
RETURN movie.title,movie.movieId, count(*)
ORDER BY count(*) DESC
但是,我的系统似乎无法处理它(使用16GB的RAM,Core i7 4th gen和SSD)。当我运行查询时,它达到RAM的97%,然后Neo4j意外关闭(可能是由于堆大小,或者是由于RAM大小)。
谢谢。
答案 0 :(得分:0)
首先,只需匹配我们需要的内容,然后在WHERE中处理其余内容,即可简化Cypher以便进行更有效的计划(这样,可以在匹配时进行过滤)
MATCH (inputMovie:Movie {movieId: 1})-[r:hasGenre]->(h:Genre)
WITH inputMovie, COLLECT (h) as inputGenres
MATCH (inputMovie)<-[r:hasRating]-(User)-[o:hasRating]->(movie)
WHERE (r.rating>3 and o.rating>3) AND ALL(genre in inputGenres WHERE (movie)-[:hasGenre]->(genre))
RETURN movie.title,movie.movieId, count(*)
ORDER BY count(*) DESC
现在,如果您不介意将数据添加到图形中以查找所需的数据,则您可以做的另一件事是将查询拆分为几个小部分,然后“缓存”结果。例如
// Cypher 1
MATCH (inputMovie:Movie {movieId: 1})-[r:hasGenre]->(h:Genre)
WITH inputMovie, COLLECT (h) as inputGenres
MATCH (movie:Movie)
WHERE ALL(genre in inputGenres WHERE (movie)-[:hasGenre]->(genre))
// Merge so that multiple runs don't create extra copies
MERGE (inputMovie)-[:isLike]->(movie)
// Cypher 2
MATCH (movie:Movie)<-[r:hasRating]-(user)
WHERE r.rating>3
// Merge so that multiple runs don't create extra copies
MERGE (user)-[:reallyLikes]->(movie)
// Cypher 3
MATCH (inputMovie:Movie{movieId: 1})<-[:reallyLikes]-(user)-[:reallyLikes]->(movie:Movie)<-[:isLike]-(inputMovie)
RETURN movie.title,movie.movieId, count(*)
ORDER BY count(*) DESC