我是Cypher的新手。我正在使用GrapheneDB和py2neo(版本2.0.2)
构建一个简单的图形在我的简单图表中,我有Repository
,Organization
& People
个节点。 IN_ORGANIZATION
和IS_ACTOR
是两种类型的关系。以下是用于创建节点和关系的代码段(entire code on GitHub,请参阅第88 - 108行)
#Create repository node if one does not exist
r = graph.merge_one("Repository", "id", record["full_name"])
#Update timestamp with time now in epoch milliseconds
r.properties["created_at"] = MyMoment.TNEM()
#Apply property change
r.push()
...
#Create organization node if one does not exist
o = graph.merge_one("Organization", "id", record["organization"])
#Update timestamp with time now in epoch milliseconds
o.properties["created_at"] = MyMoment.TNEM()
#Apply property change
o.push()
rel = Relationship(r,"IN_ORGANIZATION",o)
#create unique relation between repository and organization
#ignore if relation already exists
graph.create_unique(rel)
...
#Create actor relation if one does not exist
p = graph.merge_one("People", "id", al)
#Update timestamp with time now in epoch milliseconds
p.properties["created_at"] = MyMoment.TNEM()
p.push()
rel = Relationship(r,"IS_ACTOR",p)
#create unique relation between repository and people
#ignore if relation already exists
graph.create_unique(rel)
以上代码适用于小型数据集。当数据集增长时,每小时创建/合并约20K节点和~15K关系,处理时间长于一小时(有时几小时)。我需要减少处理时间。我可以探索哪些其他备选方案?我在考虑批处理模式?如何将其与merge_one
和create_unique
一起使用?有什么想法吗?