Question

假设我有一个包含节点信息的csv文件，每行都有一个唯一的id（第一列），另一个包含边缘的csv文件，描述节点之间的边缘（通过它们的唯一ID＃）。以下cypher代码成功加载节点，然后创建边。但是，我可以提高效率吗？我的真实数据集有数百万个节点和数千万个边缘。显然我应该使用周期性提交并创建一个索引，但是我可以以某种方式避免每个边缘的match并使用我知道我想要构建的每个边的唯一节点id的事实吗？或者我这样做是错的？我想在cypher（没有java）中完全做到这一点。

load csv from 'file:///home/user/nodes.txt' as line
create (:foo { id: toInt(line[0]), name: line[1], someprop: line[2]});

load csv from 'file:///home/user/edges.txt' as line
match (n1:foo { id: toInt(line[0])} ) 
with n1, line
match (n2:foo { id: toInt(line[1])} ) 
// if I had an index I'd use it here with: using index n2:foo(name) 
merge (n1) -[:bar]-> (n2) ;

match p = (n)-->(m) return p;

nodes.txt：

0,node0,Some Property 0
1,node1,Some Property 1
2,node2,Some Property 2
3,node3,Some Property 3
4,node4,Some Property 4
5,node5,Some Property 5
6,node6,Some Property 6
7,node7,Some Property 7
8,node8,Some Property 8
9,node9,Some Property 9
10,node10,Some Property 10
...

edges.txt：

0,2
0,4
0,8
0,13
1,4
1,8
1,15
2,4
2,6
3,4
3,7
3,8
3,11
4,10
...

Answer 1

就像罗恩上面评论的那样，LOAD CSV可能不适合大型数据集，而且他链接到的csv批量导入工具也很棒。如果您发现无法以与批量导入工具一起使用的方式轻松地楔入csv，那么Neo4J BatchInserter API非常简单易用： http://docs.neo4j.org/chunked/stable/batchinsert.html

如何有效地使用cypher从neo4j中的不同文件加载节点和边缘？

1 个答案: