*长*执行时间 - 相互利益

时间:2014-12-10 23:27:31

标签: performance neo4j cypher

我遇到了一个问题,我怀疑这是因为我无法制定有效的CYPHER查询,而且普遍缺乏neo4j体验。

背景

我有一个相对较大的数据集,当发现互相喜欢的时候,似乎很窒息。用户和他们的第二学位朋友之间。

当前统计信息:

neo4j-sh (?)$ dbinfo -g "Primitive count"
{
  "NumberOfNodeIdsInUse": 9343080,
  "NumberOfPropertyIdsInUse": 25416540,
  "NumberOfRelationshipIdsInUse": 47270718,
  "NumberOfRelationshipTypeIdsInUse": 8
}

------

Numbers:
Users: ~ 2 million
Likes: ~ 7 million
Users Likes: ~ 22 million

索引:

neo4j-sh (?)$ schema
Indexes
  ON :Employer(origin_id)       ONLINE (for uniqueness constraint)
  ON :Group(origin_id)          ONLINE (for uniqueness constraint)
  ON :Like(category)            ONLINE
  ON :Like(origin_id)           ONLINE (for uniqueness constraint)
  ON :Location(country_code)    ONLINE
  ON :Location(country)         ONLINE
  ON :Location(origin_id)       ONLINE (for uniqueness constraint)
  ON :School(origin_id)         ONLINE (for uniqueness constraint)
  ON :User(registered)          ONLINE
  ON :User(relationship_status) ONLINE
  ON :User(interested_in)       ONLINE
  ON :User(gender)              ONLINE
  ON :User(age)                 ONLINE
  ON :User(origin_id)           ONLINE (for uniqueness constraint)
  ON :User(uid)                 ONLINE (for uniqueness constraint)

Constraints
  ON (user:User) ASSERT user.uid IS UNIQUE
  ON (school:School) ASSERT school.origin_id IS UNIQUE
  ON (user:User) ASSERT user.origin_id IS UNIQUE
  ON (group:Group) ASSERT group.origin_id IS UNIQUE
  ON (employer:Employer) ASSERT employer.origin_id IS UNIQUE
  ON (like:Like) ASSERT like.origin_id IS UNIQUE
  ON (location:Location) ASSERT location.origin_id IS UNIQUE

慢查询: http://pastebin.com/MPZ3aXCs

问题:

对于此用户,第一个查询在大约12秒内执行,返回909行。还是很慢。

对于该用户,第二个查询在大约70秒内执行。对我来说,当前的问题是,试图通过朋友的匹配朋友(第33行)的共同利益进行搜索会导致时间的急剧增加。我还注意到添加这个匹配似乎创建了第二个EAGER'分支'在个人资料中。在此期间,CPU绝对固定。

如果我后退并简单地匹配两个用户之间的共同兴趣,则在< 50毫秒。

neo4j-sh (?)$ PROFILE MATCH (u:User {origin_id:2043})-[:LIKES]->(l:Like)<-[:LIKES]-(u2:User {origin_id:1212817}) return l;
3 rows

ColumnFilter
  |
  +Filter
    |
    +TraversalMatcher

+------------------+------+--------+-------------+--------------------------------------+
|         Operator | Rows | DbHits | Identifiers |                                Other |
+------------------+------+--------+-------------+--------------------------------------+
|     ColumnFilter |    3 |      0 |             |                       keep columns l |
|           Filter |    3 |      0 |             |      NOT(  UNNAMED31 ==   UNNAMED50) |
| TraversalMatcher |    3 |   1114 |             | u2,   UNNAMED50, u2,   UNNAMED31, u2 |
+------------------+------+--------+-------------+--------------------------------------+

Total database accesses: 1114

我们目前正在寻求扩展此查询以匹配用户现在看似不可能的3度朋友。

我还应该注意到我在独立的AWS c3.xlarge(4个vCPU / 8GB RAM)上运行它,除了主机neo4j之外别无其他功能。服务器配置或多或少是标准默认值。如有必要,很乐意提供。

理想情况下,我希望在单个查询中返回此信息,因为之后会对其进行处理。

非常感谢任何优化这些查询的帮助。如果我在这里错过了任何关键信息,请告诉我。

修改:使用Neo4J 2.1.6

编辑2:

我对查询进行了一些更改,似乎将dbhits的数量减少了一半。查询所用的时间现已减少到约16秒。

此处提供了包含个人资料的新查询:http://pastebin.com/UyFi89H7

除了使用额外的标准来过滤朋友的朋友之外,我还能做出更多的优化吗?

2 个答案:

答案 0 :(得分:1)

首先要提出一个非常详细的问题。

其次,通过查看Cypher查询的开头,我可以给你的建议是从一个小的起点开始,例如,首先匹配您的用户,然后使用WITH将其传递给下一步。然后检索他的位置,用WITH传递用户和位置。

正如您在第一个查询的配置文件中看到的那样,他将从Traversal Matcher开始,而不是从标签和属性索引中受益。

首次优化让您走上正轨:

PROFILE
MATCH (user:User {origin_id:138})
WITH user
MATCH (user)-[r:LIVES_IN]->(userLoc:Location), (user)-[fr:FRIENDS_WITH*2]->(fof:User)
WHERE
    user.origin_id <> fof.origin_id
    AND NOT (user)-[:FRIENDS_WITH]->(fof)

通过上述查询,他将使用索引来检索您的用户而不是遍历匹配器。

答案 1 :(得分:0)

您也可以尝试:

MATCH (u:User {origin_id:2043}),(u2:User {origin_id:1212817})
MATCH path = allShortestPaths((u1)-[:LIKES*..2]-(u2)) 
RETURN nodes(path)[1] as like

尝试尽可能早地将基数降低到最低限度, 而不是多次匹配每个fof,尝试首先聚合到一个fof实例然后匹配

PROFILE
MATCH (user:User {origin_id:138})-[:LIVES_IN]->(userLoc:Location)-[:IN_COUNTRY]->(country)
MATCH (user)-[fr:FRIENDS_WITH]->(friend:User)-[fofr:FRIENDS_WITH]->(fof:User)
WHERE (fof.dob_age <= 35 AND fof.dob_age >= 20)
WITH user, count(distinct friend) as mutual_friend_count, collect(distinct friend) as mutual_friends, fof,
     (ABS(user.dob_age - fof.dob_age)) as age_diff, userLoc, country

WHERE (fof)-[:LIVES_IN]->(fofLoc:Location)-[:IN_COUNTRY]->(country)

RETURN
    fof.origin_id as fof_origin_id,
    fof.first_name as fof_first_name,
    fof.last_name as fof_last_name,
    fof.dob_age as fof_age,
    user.dob_age as user_age,
    userLoc.latitude as user_loc_latitude,
    userLoc.longitude as user_loc_longitude,
    fofLoc.name as fof_loc_name,
    fofLoc.latitude as fof_loc_latitude,
    fofLoc.longitude as fof_loc_longitude,
    age_diff as age_diff,
    mutual_friend_count, mutual_friends