Neo4j SDN 4 GraphId性能与指数

时间:2017-05-26 19:55:47

标签: neo4j cypher spring-data-neo4j-4

在我的Neo4j / SDN 4应用程序中,我的所有Cypher查询都基于内部Neo4j ID。

这是一个问题,因为我无法在我的网络应用网址上依赖这些ID。 Neo4j可以重复使用这些ID,因此很有可能在将来的某个时间,我们可以找到相同的ID。

我尝试根据以下解决方案重新实现此逻辑:Using the graph to control unique id generation但注意到查询性能下降。

从理论的角度来看,如果Cypher根据@Index(unique = true, primary = true的属性查询

例如:

@Index(unique = true, primary = true)
private Long uid;

entity.uid = {someId}

与Cypher查询具有相同的性能,该查询基于内部Neo4j ID:

id(entity) = {someId} 

已更新

这是:schema输出:

Indexes
   ON :BaseEntity(uid) ONLINE
   ON :Characteristic(lowerName) ONLINE
   ON :CharacteristicGroup(lowerName) ONLINE
   ON :Criterion(lowerName) ONLINE
   ON :CriterionGroup(lowerName) ONLINE
   ON :Decision(lowerName) ONLINE
   ON :FlagType(name) ONLINE (for uniqueness constraint)
   ON :HAS_VALUE_ON(value) ONLINE
   ON :HistoryValue(originalValue) ONLINE
   ON :Permission(code) ONLINE (for uniqueness constraint)
   ON :Role(name) ONLINE (for uniqueness constraint)
   ON :User(email) ONLINE (for uniqueness constraint)
   ON :User(username) ONLINE (for uniqueness constraint)
   ON :Value(value) ONLINE

Constraints
   ON ( flagtype:FlagType ) ASSERT flagtype.name IS UNIQUE
   ON ( permission:Permission ) ASSERT permission.code IS UNIQUE
   ON ( role:Role ) ASSERT role.name IS UNIQUE
   ON ( user:User ) ASSERT user.email IS UNIQUE
   ON ( user:User ) ASSERT user.username IS UNIQUE

正如您所看到的,我在:BaseEntity(uid)

上有一个索引

BaseEntity是我的实体层次结构中的基类,例如:

@NodeEntity
public abstract class BaseEntity {

    @GraphId
    private Long id;

    @Index(unique = false)
    private Long uid;

    private Date createDate;

    private Date updateDate;

...

}

@NodeEntity
public class Commentable extends BaseEntity {
...
}

@NodeEntity
public class Decision extends Commentable {

    private String name;

}

在我寻找uid的示例时,会使用此(d:Decision) WHERE d.uid = {uid}索引吗?

PROFILE结果 - 内部ID与索引属性

基于内部ID的查询

PROFILE MATCH (parentD)-[:CONTAINS]->(childD:Decision) 
WHERE id(parentD) = 1474333 
MATCH (childD)-[relationshipValueRel1475199:HAS_VALUE_ON]-(filterCharacteristic1475199) 
WHERE id(filterCharacteristic1475199) = 1475199 
WITH relationshipValueRel1475199, childD 
WHERE  ([1, 19][0] <= relationshipValueRel1475199.value <=  [1, 19][1] )  
WITH childD  
MATCH (childD)-[relationshipValueRel1474358:HAS_VALUE_ON]-(filterCharacteristic1474358) 
WHERE id(filterCharacteristic1474358) = 1474358 
WITH relationshipValueRel1474358, childD 
WHERE  (ANY (id IN ['Compact'] WHERE id IN relationshipValueRel1474358.value ))  
WITH childD  
MATCH (childD)-[relationshipValueRel1475193:HAS_VALUE_ON]-(filterCharacteristic1475193) 
WHERE id(filterCharacteristic1475193) = 1475193 
WITH relationshipValueRel1475193, childD 
WHERE  (ANY (id IN ['16:9', '3:2', '4:3', '1:1'] 
WHERE id IN relationshipValueRel1475193.value ))  
WITH childD  
OPTIONAL MATCH (childD)-[vg:HAS_VOTE_ON]->(c) 
WHERE id(c) IN [1474342, 1474343, 1474340, 1474339, 1474336, 1474352, 1474353, 1474350, 1474351, 1474348, 1474346, 1474344] 
WITH childD, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes 
WITH * MATCH (childD)-[ru:CREATED_BY]->(u:User)  
WITH ru, u, childD , toFloat(sum(weight)) as weight, toInt(sum(totalVotes)) as totalVotes  
ORDER BY  weight DESC 
SKIP 0 LIMIT 10 
RETURN ru, u, childD AS decision, weight, totalVotes, 
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD) | {entityId: id(entity),  types: labels(entity), totalComments: toInt(comg.totalComments)} ] AS commentGroups, 
[ (parentD)<-[:DEFINED_BY]-(c1)<-[vg1:HAS_VOTE_ON]-(childD) | {criterionId: id(c1),  weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria, 
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[v1:HAS_VALUE_ON]-(childD)  WHERE NOT ((ch1)<-[:DEPENDS_ON]-())  | {characteristicId: id(ch1),  value: v1.value, totalHistoryValues: toInt(v1.totalHistoryValues), description: v1.description, valueType: ch1.valueType, visualMode: ch1.visualMode} ] AS valuedCharacteristics

PROFILE输出:

Cypher版本:CYPHER 3.1,规划师:COST,运行时间:解释。在238毫秒内350554总db命中率。

enter image description here

基于索引属性uid的查询

PROFILE MATCH (parentD)-[:CONTAINS]->(childD:Decision) 
WHERE parentD.uid = 61 
MATCH (childD)-[relationshipValueRel1475199:HAS_VALUE_ON]-(filterCharacteristic1475199) 
WHERE filterCharacteristic1475199.uid = 15 
WITH relationshipValueRel1475199, childD 
WHERE  ([1, 19][0] <= relationshipValueRel1475199.value <=  [1, 19][1] )  
WITH childD  
MATCH (childD)-[relationshipValueRel1474358:HAS_VALUE_ON]-(filterCharacteristic1474358) 
WHERE filterCharacteristic1474358.uid = 10 
WITH relationshipValueRel1474358, childD 
WHERE  (ANY (id IN ['Compact'] WHERE id IN relationshipValueRel1474358.value ))  
WITH childD  
MATCH (childD)-[relationshipValueRel1475193:HAS_VALUE_ON]-(filterCharacteristic1475193) 
WHERE filterCharacteristic1475193.uid = 14 
WITH relationshipValueRel1475193, childD 
WHERE  (ANY (id IN ['16:9', '3:2', '4:3', '1:1'] 
WHERE id IN relationshipValueRel1475193.value ))  
WITH childD  
OPTIONAL MATCH (childD)-[vg:HAS_VOTE_ON]->(c) 
WHERE c.uid IN [26, 27, 24, 23, 20, 36, 37, 34, 35, 32, 30, 28] 
WITH childD, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes 
WITH * MATCH (childD)-[ru:CREATED_BY]->(u:User)  
WITH ru, u, childD , toFloat(sum(weight)) as weight, toInt(sum(totalVotes)) as totalVotes  
ORDER BY  weight DESC 
SKIP 0 LIMIT 10 
RETURN ru, u, childD AS decision, weight, totalVotes, 
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD) | {entityId: id(entity),  types: labels(entity), totalComments: toInt(comg.totalComments)} ] AS commentGroups, 
[ (parentD)<-[:DEFINED_BY]-(c1)<-[vg1:HAS_VOTE_ON]-(childD) | {criterionId: id(c1),  weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria, 
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[v1:HAS_VALUE_ON]-(childD)  WHERE NOT ((ch1)<-[:DEPENDS_ON]-())  | {characteristicId: id(ch1),  value: v1.value, totalHistoryValues: toInt(v1.totalHistoryValues), description: v1.description, valueType: ch1.valueType, visualMode: ch1.visualMode} ] AS valuedCharacteristics

Cypher版本:CYPHER 3.1,规划师:COST,运行时间:解释。 671326总db命中率为426 ms。

enter image description here

有没有机会根据uid提高性能?

1 个答案:

答案 0 :(得分:5)

您不应在网址中使用Neo4j内部ID,因为删除节点后可以重复使用它们。

从性能的角度来看,内部id尽可能快 - 它实际上是带有节点/关系记录的文件中的偏移量(您可能已经注意到这些是2个独立的id序列,您可以使用id = z和相同的id = x的关系。

任何索引的使用都必须更慢,因为数据库首先进行索引查找,获取内部标识,然后读取节点记录。

然而,绝大多数应用程序性能差异可以忽略不计 - 可能比网络延迟或一般OGM开销小得多。

如果你看到明显的差异

  • 验证数据库中是否存在索引(例如Neo4j浏览器中的:schema
  • 启用日志记录并验证您的查询是否具有正确的标签(info设置org.neo4j.ogm级别)
  • 如果索引存在且查询包含正确的标签,则使用PROFILE检查查询计划

<强>已更新

是的,索引将用于以下查询:

MATCH (d:Decision) WHERE d.uid = {uid} ...

应该由

生成
session.load(Decision.class, uid)

如果您的索引是主要的,或findByUid上的DecisionRepository

请注意,当where子句出现在查询中间时,可能不会使用该索引:

...
WITH x
MATCH (x)-[...]-(d) WHERE d.uid = {uid} ...

这取决于查询计划,您应该使用PROFILE来调查此问题。