我在mysql表中有473639个节点和995863个parent->子关系。
使用正常操作和批处理操作来获取数据,创建节点和关系,但这两种操作都很慢。 有没有办法让这个过程更快?
代码如下:
import MySQLdb as my
from py2neo import neo4j, node, rel
def conn(query):
db = my.connect(host='localhost',
user='root',
passwd='root',
db='localdb')
cur = db.cursor()
cur.execute(query)
return db, cur
query = 'select * table1'
db, cur = conn(query)
d = dict()
graph = neo4j.GraphDatabaseService()
batch = neo4j.WriteBatch(graph)
def create_node(a):
if a not in d:
try:
A = graph.create(node(name=str(a)))
# for batch operation
#A = batch.create(node(name=str(a)))
d[a] = A
except Exception, e:
print e
else:
A = d[a]
return A
cnt = 1
# create node
for row in cur.fetchall():
a,b = get_cat(row[0]), get_cat(row[1])
try:
A, B = create_node(a), create_node(b)
rels.append((A,B))
except Exception, e:
print e
#create relations
for item in rels:
a = item[0]
b = item[1]
graph.create(rel(a,"is parent of",b))
# for batch operation
#batch.create(node(name=str(a)))
#res = batch.submit()
#print res
print 'end'
答案 0 :(得分:0)
批处理比创建单个节点要快得多。但是如果你运行批处理,你应该每隔几百项提交一次。批量大时,它会变慢。尝试类似:
graph = neo4j.GraphDatabaseService()
batch = neo4j.WriteBatch(graph)
i = 0
results = []
for item in rels:
a = item[0]
b = item[1]
batch.create(rel(a,"is parent of",b))
# submit every 500 steps
if i % 500 == 0:
# collect results in list
results.extend(batch.submit())
# reinitialize and clear batch
batch = neo4j.WriteBatch(graph)
# submit last items
results.extend(batch.submit())
一个很好的选择是Cypher transactions。对我来说,它们运行得更快,但你必须编写Cypher查询。对于简单的项目创建,这显然比使用py2neo nodes / rels更复杂。但它可能会派上用场进行其他操作(例如MERGE
更新节点)。请注意,您还必须定期.execute()
交易,如果交易过大则会减慢交易速度。
session = cypher.Session("http://localhost:7474")
tx = session.create_transaction()
# send three statements to for execution but leave the transaction open
tx.append("MERGE (a:Person {name:'Alice'}) "
"RETURN a")
tx.append("MERGE (b:Person {name:'Bob'}) "
"RETURN b")
tx.append("MATCH (a:Person), (b:Person) "
"WHERE a.name = 'Alice' AND b.name = 'Bob' "
"CREATE UNIQUE (a)-[ab:KNOWS]->(b) "
"RETURN ab")
tx.execute()
使用事务和批处理,我会在几分钟内编写数百万个节点/关系。你必须尝试不同的批量/交易规模(例如从100到5000),我认为这取决于neo4j正在使用的内存量。