我使用py2neo导入数十万个节点。我创建了一个defaultdict来将邻域映射到城市。一个动机是更有效地导入这些与Neo4j's load tool失败的关系。
因为batch documentation建议避免使用它,所以我偏离了像this post的OP这样的实现。相反,文档建议我使用Cypher。但是,我喜欢能够从我创建的defaultdict创建节点。另外,我发现在first link演示时导入这些信息太困难了。
要降低导入的速度,我应该创建Cypher transaction(并提交每10,00)而不是以下循环吗?
for city_name, neighborhood_names in city_neighborhood_map.iteritems():
city_node = graph.find_one(label="City", property_key="Name", property_value=city_name)
for neighborhood_name in neighborhood_names:
neighborhood_node = Node("Neighborhood", Name=neighborhood_name)
rel = Relationship(neighborhood_node, "IN", city_node)
graph.create(rel)
我有一个超时时间,当我执行以下操作时,它看起来很慢。即使我打破了交易,所以它每1000个社区提交一次,但它的处理速度仍然很慢。
tx = graph.cypher.begin()
statement = "MERGE (city {Name:{City_Name}}) CREATE (neighborhood { Name : {Neighborhood_Name}}) CREATE (neighborhood)-[:IN]->(city)"
for city_name, neighborhood_names in city_neighborhood_map.iteritems():
for neighborhood_name in neighborhood_names:
tx.append(statement, {"City_Name": city_name, "Neighborhood_Name": neighborhood_name})
tx.commit()
将指针保存到每个城市会很棒,所以每次合并时我都不需要查找它。
答案 0 :(得分:2)
在两次运行中执行此操作可能会更快,即CREATE
所有节点首先具有唯一约束(应该非常快),然后CREATE
第二轮中的关系。
首先限制使用标签City
和Neighborhood
,之后更快MATCH
:
graph.schema.create_uniqueness_constraint('City', 'Name')
graph.schema.create_uniqueness_constraint('Neighborhood', 'Name')
创建所有节点:
tx = graph.cypher.begin()
statement = "CREATE (:City {Name: {name}})"
for city_name in city_neighborhood_map.keys():
tx.append(statement, {"name": city_name})
statement = "CREATE (:Neighborhood {Name: {name}})"
for neighborhood_name in neighborhood_names: # get all neighborhood names for this
tx.append(statement, {name: neighborhood_name})
tx.commit()
现在关系应该很快(由于约束/索引而快速MATCH
):
tx = graph.cypher.begin()
statement = "MATCH (city:City {Name: {City_Name}}), MATCH (n:Neighborhood {Name: {Neighborhood_Name}}) CREATE (n)-[:IN]->(city)"
for city_name, neighborhood_names in city_neighborhood_map.iteritems():
for neighborhood_name in neighborhood_names:
tx.append(statement, {"City_Name": city_name, "Neighborhood_Name": neighborhood_name})
tx.commit()