py2eo,neo4j:如何处理大型IO操作

时间:2014-05-13 11:24:33

标签: python neo4j py2neo

我在mysql表中有473639个节点和995863个parent->子关系。

使用正常操作和批处理操作来获取数据,创建节点和关系,但这两种操作都很慢。 有没有办法让这个过程更快?

代码如下:

import MySQLdb as my
from py2neo import neo4j, node, rel

def conn(query):
    db = my.connect(host='localhost',
                    user='root',
                    passwd='root',
                    db='localdb')
    cur = db.cursor()
    cur.execute(query)
    return db, cur

query = 'select * table1'
db, cur = conn(query)
d = dict()

graph = neo4j.GraphDatabaseService()
batch = neo4j.WriteBatch(graph)


def create_node(a):
    if a not in d:
        try:
            A = graph.create(node(name=str(a)))

            # for batch operation
            #A = batch.create(node(name=str(a)))

            d[a] = A
        except Exception, e:
            print e
    else:
        A = d[a]
    return A

cnt = 1

# create node

for row in cur.fetchall():
    a,b = get_cat(row[0]), get_cat(row[1])
    try:
        A, B = create_node(a), create_node(b)
        rels.append((A,B))
    except Exception, e:
        print e


#create relations

for item in rels:
    a = item[0]
    b = item[1]
    graph.create(rel(a,"is parent of",b))

    # for batch operation
    #batch.create(node(name=str(a)))


#res = batch.submit()
#print res

print 'end'

1 个答案:

答案 0 :(得分:0)

批量

批处理比创建单个节点要快得多。但是如果你运行批处理,你应该每隔几百项提交一次。批量大时,它会变慢。尝试类似:

graph = neo4j.GraphDatabaseService()
batch = neo4j.WriteBatch(graph)

i = 0
results = []

for item in rels:
    a = item[0]
    b = item[1]
    batch.create(rel(a,"is parent of",b))

    # submit every 500 steps
    if i % 500 == 0:
        # collect results in list
        results.extend(batch.submit())
        # reinitialize and clear batch 
        batch = neo4j.WriteBatch(graph)

# submit last items         
results.extend(batch.submit())

Cypher交易

一个很好的选择是Cypher transactions。对我来说,它们运行得更快,但你必须编写Cypher查询。对于简单的项目创建,这显然比使用py2neo nodes / rels更复杂。但它可能会派上用场进行其他操作(例如MERGE更新节点)。请注意,您还必须定期.execute()交易,如果交易过大则会减慢交易速度。

session = cypher.Session("http://localhost:7474")
tx = session.create_transaction()

# send three statements to for execution but leave the transaction open
tx.append("MERGE (a:Person {name:'Alice'}) "
          "RETURN a")
tx.append("MERGE (b:Person {name:'Bob'}) "
          "RETURN b")
tx.append("MATCH (a:Person), (b:Person) "
          "WHERE a.name = 'Alice' AND b.name = 'Bob' "
          "CREATE UNIQUE (a)-[ab:KNOWS]->(b) "
          "RETURN ab")
tx.execute()

使用事务和批处理,我会在几分钟内编写数百万个节点/关系。你必须尝试不同的批量/交易规模(例如从100到5000),我认为这取决于neo4j正在使用的内存量。