Neo4J:查询优化/模型验证

时间:2018-07-04 00:30:05

标签: neo4j cypher

Neo4J的新手,如果我做的事情非常错误,请提前道歉。我试图根据他们选择的类别和他们喜欢的标签向用户展示他们可能感兴趣的用户文章。 我在Neo4j中的模型是这样的

(:USER)-[:LIKES]->(:TAG)
(:ARTICLE)-[:PUBLISHED_BY]->(:PROVIDER)
(:ARTICLE)-[:HAS_CATEGORY]->(:CATEGORY)
(:USER)-[:DISLIKES]-(:ARTICLE)
(:USER)-[:INTERESTED_IN]->(:CATEGORY)

当我尝试运行以下查询以获取所需的结果时...我得到了,但是查询需要16-18秒的时间来执行。

MATCH (u:USER {id: $userid})-[:LIKES]->(t:TAG) 
WITH u,t, collect(t.name) as tags 
UNWIND tags as tag with u,tag 
MATCH (c:CATEGORY)<-[*]-(a:ARTICLE)-[pub:PUBLISHED_BY]->(p:PROVIDER) 
WHERE a.keywords contains tag OR c.id in $categoryArray 
  AND NOT (u)-[:DISLIKES]->(a) 
RETURN DISTINCT a.id AS id, a.title AS title, pub.pubDate 
ORDER BY pub.pubDate DESC LIMIT 250

是否有更快更好的方法来获得所需的结果? 注意:我在ubuntu机器上使用Neo4j 3.4.1版本,页面缓存:512mb,最小和最大堆大小:1500mb

2 个答案:

答案 0 :(得分:0)

如果在您的模型文章中将标签连接到标签会更好。

此位:a.keywords contains tag不支持索引,因此将导致完整扫描。

此外,从类别到文章可能是一条长链,因此请在其中添加rel类型并添加上限。最好根据类别检查找到的文章。

MATCH (u:USER {id: $userid})-[:LIKES]->(tag:TAG) 
MATCH (a:ARTICLE)-[:HAS_TAG]->(tag)
WITH distinct u, a
WHERE any(c IN categories WHERE NOT shortestPath((c)<-[:IN_CATEGORY*]-(a)) IS NULL)
  AND NOT (u)-[:DISLIKES]->(a) 
MATCH (a)-[pub:PUBLISHED_BY]->(p:PROVIDER) 
RETURN DISTINCT a.id AS id, a.title AS title, pub.pubDate 
ORDER BY pub.pubDate DESC LIMIT 250

还可以使用PROFILE检查查询计划,以查看是否存在瓶颈或未建立索引的字段(您可以使用右下角的双箭头展开框)

答案 1 :(得分:0)

感谢@Michael,我知道将标签作为与文章相关的单独节点可以使搜索更快,但是下面的查询将搜索时间从16-18秒减少到了3-4秒。

MATCH (u:USER {id: $userId})-[:INTERESTED_IN]->(c:CATEGORY)<-[*]-(a:ARTICLE)[pub:PUBLISHED_BY]->(p:PROVIDER) WHERE NOT (u)-[:DISLIKES]->(a) RETURN DISTINCT a.id, a.title, pub.pubDate ORDER BY pub.pubDate DESC LIMIT 150 UNION MATCH (u:USER {id: $userId})-[:LIKES]->(t:TAG) WITH u, t, collect(t.name) AS tags UNWIND tags AS tag MATCH (a:ARTICLE)-[pub:PUBLISHED_BY]-(:PROVIDER) WHERE a.keywords CONTAINS tag AND NOT (u)-[:DISLIKES]->(a) RETURN DISTINCT a.id, a.title, pub.pubDate ORDER BY pub.pubDate DESC LIMIT 150