我正在使用NEO4j 3.5来存储和查询人与人之间的关系。我有带有“用户”标签的节点和带有“朋友”标签的关系。我可以结识朋友,但查询时间太长。当前返回的时间是4秒到6秒。这不是高事务性的neo4j数据库,并且服务器具有大量可用的CPU和内存。服务器上的负载不到3个,共有8个核心。它在AWS EC2实例上运行。数据库中大约有25万个节点,数据库总大小不到750mb。
这是我当前正在使用的查询:
MATCH (user:User {user_id:1145})-[:FRIENDS*3]->(fof:User)
WHERE NOT (user:User)-[:FRIENDS]->(fof:User)
RETURN count(distinct fof.user_id)
此密码查询返回的计数为69,704,这是正确的。
可以对密码查询或NEO4j数据库引擎进行哪些优化以更快地返回结果?
执行计划
+-----------------------+----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Cache H/M | Identifiers | Ordered by | Other |
+-----------------------+----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| +ProduceResults | 0 | 1 | 0 | 0/0 | count(distinct fof.user_id) | | 0.0 |
| | +----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| +EagerAggregation | 0 | 1 | 326421 | 0/0 | count(distinct fof.user_id) | | 0.0 |
| | +----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| +AntiSemiApply | 0 | 256717 | 0 | 0/0 | anon[33], fof, user | user.user_id ASC | 0.0 |
| |\ +----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| | +Expand(Into) | 0 | 0 | 8006149 | 0/0 | REL80, fof, user | | 0.0; (user)-[ REL80:FRIENDS]->(fof) |
| | | +----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| | +Filter | 1 | 260120 | 520240 | 0/0 | fof, user | | 0.0; fof:User |
| | | +----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| | +Argument | 1 | 260120 | 0 | 0/0 | fof, user | | 0.0 |
| | +----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| +Filter | 0 | 260120 | 260120 | 0/0 | anon[33], fof, user | user.user_id ASC | 0.0; fof:User |
| | +----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| +VarLengthExpand(All) | 0 | 260120 | 267999 | 0/0 | anon[33], fof, user | user.user_id ASC | 0.0; (user)-[anon[33]:FRIENDS*3..3]->(fof) |
| | +----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| +NodeIndexSeek | 1 | 1 | 3 | 0/0 | user | user.user_id ASC | 0.0; :User(user_id) |
+-----------------------+----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
答案 0 :(得分:2)
您的WHERE
子句包含一个模式,该模式要求每个fof
附加数据库命中数。您可以通过在内存中保留user
的所有直接好友的列表,并更改您的WHERE
子句以使其仅在列表中进行搜索,来避免这些数据库命中。 (根据您的个人资料数据,这可以节省8006149 + 520240,或节省超过850万个数据库匹配-这是整个查询的大部分匹配。)
在您的查询中,如果同一fof
节点多次匹配,则每次都将执行相同的WHERE
测试。您可以通过在进行fof
测试之前在 过滤掉重复的WHERE
个节点来避免这种情况。这也意味着您以后不再需要删除重复项。
例如:
MATCH (user:User {user_id:1145})-[:FRIENDS]->(f:User)
WITH user, COLLECT(f) AS friends
MATCH (user)-[:FRIENDS*3]->(fof:User)
WITH DISTINCT friends, fof
WHERE NOT fof IN friends
RETURN COUNT(fof)