Neo4J协同过滤比预期慢

时间:2015-12-09 09:22:21

标签: neo4j query-optimization cypher query-performance collaborative-filtering

我正在努力在我们的Neo4J图表之上实施推荐系统,并开始查看我计划使用的查询,但它的执行速度比我预期的慢得多

统计

Neo4J Version: 2.3.1
Nodes: 820K
Relationships: 7.6M

我已经对查询优化进行了相当多的研究,但据我所知,我没有在查询结构中出现任何常见/常见的陷阱(但我不是专家)

这是一个带有测试数据集的开发控制台:http://console.neo4j.org/r/b7jk2b

查询

MATCH (u1:User {id: {user_id}})-[l1:LIKES]->(p1:Product)
WITH u1, l1, p1
ORDER BY p1.created_at DESC
LIMIT 10

MATCH (p1)<-[:LIKES]-(u2:User)
WHERE NOT u1=u2
WITH u1, l1, p1, u2, COUNT(u2) as rating
ORDER BY rating DESC
LIMIT 50

MATCH (u2)-[l2:LIKES]->(recommendation:Product)
WHERE NOT (p1)=(recommendation)
WITH recommendation, COUNT(recommendation) as weight
RETURN recommendation.id as id
ORDER BY weight DESC
LIMIT {limit}

我们的索引

Indexes
ON :LIKES(created_at)     ONLINE  
ON :Product(id)           ONLINE  
ON :Product(created_at)   ONLINE  
ON :User(id)              ONLINE  
ON :User(date_joined)     ONLINE

No constraints

查询配置文件输出(针对我们的生产数据集的副本)

+-------------------+----------------+--------+---------+--------------------------------------------+---------------------------------------------------------+
| Operator          | Estimated Rows | Rows   | DB Hits | Identifiers                                | Other                                                   |
+-------------------+----------------+--------+---------+--------------------------------------------+---------------------------------------------------------+
| +ProduceResults   |              7 |    100 |       0 | id                                         | id                                                      |
| |                 +----------------+--------+---------+--------------------------------------------+---------------------------------------------------------+
| +Projection       |              7 |    100 |       0 | anon[382], id, recommendation, weight      | anon[382]                                               |
| |                 +----------------+--------+---------+--------------------------------------------+---------------------------------------------------------+
| +Top              |              7 |    100 |       0 | anon[382], recommendation, weight          | Literal(100); weight                                    |
| |                 +----------------+--------+---------+--------------------------------------------+---------------------------------------------------------+
| +Projection       |              7 | 129342 |  129342 | anon[382], recommendation, weight          | recommendation.id; weight                               |
| |                 +----------------+--------+---------+--------------------------------------------+---------------------------------------------------------+
| +EagerAggregation |              7 | 129342 |       0 | recommendation, weight                     | recommendation                                          |
| |                 +----------------+--------+---------+--------------------------------------------+---------------------------------------------------------+
| +Filter           |             44 | 442432 |  471953 | l1, l2, p1, rating, recommendation, u1, u2 | Ands(NOT(p1 == recommendation), recommendation:Product) |
| |                 +----------------+--------+---------+--------------------------------------------+---------------------------------------------------------+
| +Expand(All)      |             44 | 472039 |  472089 | l1, l2, p1, rating, recommendation, u1, u2 | (u2)-[l2:LIKES]->(recommendation)                       |
| |                 +----------------+--------+---------+--------------------------------------------+---------------------------------------------------------+
| +Top              |             10 |     50 |       0 | l1, p1, rating, u1, u2                     | Literal(50); rating                                     |
| |                 +----------------+--------+---------+--------------------------------------------+---------------------------------------------------------+
| +EagerAggregation |             10 |    527 |       0 | l1, p1, rating, u1, u2                     | u1, l1, p1, u2                                          |
| |                 +----------------+--------+---------+--------------------------------------------+---------------------------------------------------------+
| +Filter           |             92 |    563 |     563 | anon[82], anon[119], l1, p1, u1, u2        | Ands(NOT(u1 == u2), u2:User)                            |
| |                 +----------------+--------+---------+--------------------------------------------+---------------------------------------------------------+
| +Expand(All)      |             92 |    574 |     584 | anon[82], anon[119], l1, p1, u1, u2        | (p1)<-[:LIKES]-(u2)                                     |
| |                 +----------------+--------+---------+--------------------------------------------+---------------------------------------------------------+
| +Top              |              5 |     10 |       0 | anon[82], l1, p1, u1                       | Literal(10);                                            |
| |                 +----------------+--------+---------+--------------------------------------------+---------------------------------------------------------+
| +Projection       |              5 |     42 |      42 | anon[82], l1, p1, u1                       | u1; l1; p1; p1.created_at                               |
| |                 +----------------+--------+---------+--------------------------------------------+---------------------------------------------------------+
| +Filter           |              5 |     42 |     413 | l1, p1, u1                                 | p1:Product                                              |
| |                 +----------------+--------+---------+--------------------------------------------+---------------------------------------------------------+
| +Expand(All)      |              6 |    413 |     414 | l1, p1, u1                                 | (u1)-[l1:LIKES]->(p1)                                   |
| |                 +----------------+--------+---------+--------------------------------------------+---------------------------------------------------------+
| +NodeIndexSeek    |              1 |      1 |       2 | u1                                         | :User(id)                                               |
+-------------------+----------------+--------+---------+--------------------------------------------+---------------------------------------------------------+

我已经看过人们使用Neo4j进行实时协同过滤的案例研究,所以我认为必须有可能让这种查询在这种数据集上运行。我不现实吗?我们在Amazon EC2 Compute-Optimized节点(c4.large)上运行此操作,因此我认为它具有相当的性能。

我在这里摸不着头脑,非常感谢任何投入。

干杯, 大卫。

1 个答案:

答案 0 :(得分:0)

[Aside: The dev console, when reopened, does not re-create indexes, so they have to be manually recreated.]

I don't know if this is good enough for you, but you can eliminate about 44% of the DB hits in your profiled results by simply not specifying the labels for most of the nodes (p1, u2, and recommendation) in your query:

MATCH (u1:User {id: {user_id}})-[l1:LIKES]->(p1)
WITH u1, l1, p1
ORDER BY p1.created_at DESC
LIMIT 10

MATCH (p1)<-[:LIKES]-(u2)
WHERE NOT u1=u2
WITH u1, l1, p1, u2, COUNT(u2) as rating
ORDER BY rating DESC
LIMIT 50

MATCH (u2)-[l2:LIKES]->(recommendation)
WHERE NOT (p1)=(recommendation)
WITH recommendation, COUNT(recommendation) as weight
RETURN recommendation.id as id
ORDER BY weight DESC
LIMIT {limit}

The label for u1 should still be specified in the query, since, that allows Cypher to index on :User(id). In general, one should carefully evaluate a query to see when node labels can be eliminated. In your case, the p1, u2, and recommendation nodes can be found by following relationships (and, I presume, the LIKE relationship type is only used to point to Product nodes), so specifying their labels is superfluous and causes unnecessary work.

The profile results for the above query will have a DB Hits value of 0 for all the Filter steps (and in one case, the Filter step will be eliminated entirely).