优化py2neo的密码插入

时间:2015-06-03 00:22:47

标签: python neo4j graph-databases py2neo

我使用py2neo导入数十万个节点。我创建了一个defaultdict来将邻域映射到城市。一个动机是更有效地导入这些与Neo4j's load tool失败的关系。

因为batch documentation建议避免使用它,所以我偏离了像this post的OP这样的实现。相反,文档建议我使用Cypher。但是,我喜欢能够从我创建的defaultdict创建节点。另外,我发现在first link演示时导入这些信息太困难了。

要降低导入的速度,我应该创建Cypher transaction(并提交每10,00)而不是以下循环吗?

for city_name, neighborhood_names in city_neighborhood_map.iteritems():
     city_node = graph.find_one(label="City", property_key="Name", property_value=city_name)
         for neighborhood_name in neighborhood_names:
              neighborhood_node = Node("Neighborhood", Name=neighborhood_name)
              rel = Relationship(neighborhood_node, "IN", city_node)
              graph.create(rel)

我有一个超时时间,当我执行以下操作时,它看起来很慢。即使我打破了交易,所以它每1000个社区提交一次,但它的处理速度仍然很慢。

tx = graph.cypher.begin()
statement = "MERGE (city {Name:{City_Name}}) CREATE (neighborhood { Name : {Neighborhood_Name}}) CREATE (neighborhood)-[:IN]->(city)"
for city_name, neighborhood_names in city_neighborhood_map.iteritems():
    for neighborhood_name in neighborhood_names:
        tx.append(statement, {"City_Name": city_name, "Neighborhood_Name": neighborhood_name})
tx.commit()

将指针保存到每个城市会很棒,所以每次合并时我都不需要查找它。

1 个答案:

答案 0 :(得分:2)

在两次运行中执行此操作可能会更快,即CREATE所有节点首先具有唯一约束(应该非常快),然后CREATE第二轮中的关系。

首先限制使用标签CityNeighborhood,之后更快MATCH

graph.schema.create_uniqueness_constraint('City', 'Name')
graph.schema.create_uniqueness_constraint('Neighborhood', 'Name')

创建所有节点:

tx = graph.cypher.begin()

statement = "CREATE (:City {Name: {name}})"
for city_name in city_neighborhood_map.keys():
    tx.append(statement, {"name": city_name})

statement = "CREATE (:Neighborhood {Name: {name}})"
for neighborhood_name in neighborhood_names: # get all neighborhood names for this
    tx.append(statement, {name: neighborhood_name})

tx.commit()

现在关系应该很快(由于约束/索引而快速MATCH):

tx = graph.cypher.begin()
statement = "MATCH (city:City {Name: {City_Name}}), MATCH (n:Neighborhood {Name: {Neighborhood_Name}}) CREATE (n)-[:IN]->(city)"
for city_name, neighborhood_names in city_neighborhood_map.iteritems():
    for neighborhood_name in neighborhood_names:
        tx.append(statement, {"City_Name": city_name, "Neighborhood_Name": neighborhood_name})

tx.commit()