Question

我想将大约4000万行的csv-Files导入neo4j。为此，我尝试使用＆＃34; batchimporter＆＃34;来自https://github.com/jexp/batch-import。也许这是我提供自己ID的问题。这是示例

nodes.csv

I：ID   升：标签

315041100人

201215100人

315041200人

rels.csv：

开始   结束   类型   relart

315041100 201215100 HAS_RELATION 30006

315041200 315041100 HAS_RELATION 30006

batch.properties的内容：

use_memory_mapped_buffers=true
neostore.nodestore.db.mapped_memory=1000M
neostore.relationshipstore.db.mapped_memory=5000M
neostore.propertystore.db.mapped_memory=4G
neostore.propertystore.db.strings.mapped_memory=2000M
neostore.propertystore.db.arrays.mapped_memory=1000M
neostore.propertystore.db.index.keys.mapped_memory=1500M
neostore.propertystore.db.index.mapped_memory=1500M
batch_import.node_index.node_auto_index=exact


./import.sh graph.db nodes.csv rels.csv

将被处理而不会出错，但大约需要60秒！

Importing 3 Nodes took 0 seconds 
Importing 2 Relationships took 0 seconds 
Total import time: 54 seconds

当我使用较小的ID时 - 例如3150411而不是315041100 - 只需1秒！

Importing 3 Nodes took 0 seconds 
Importing 2 Relationships took 0 seconds 
Total import time: 1 seconds

实际上我会采用10位数的更大ID。我不知道自己做错了什么。谁能看到错误？

JDK 1.7
batchimporter 2.1.3（使用neo4j 2.1.3）
OS：ubuntu 14.04
硬件：8核Intel-CPU，16GB RAM

Answer 1

我认为问题在于批处理导入程序将这些ID解释为磁盘上的实际物理ID。因此，花费在文件系统上的时间，将商店文件扩展到可以适应那些高ID的大小。

您提供的ID应该是＆＃34;内部＆＃34;批量导入，还是？虽然我不确定如何告诉批量导入器是这种情况。

@ michael-hunger那里有什么输入？

Answer 2

问题是这些ID是Neo4j的内部，它们代表磁盘记录ID。如果你在那里提供高价值，Neo4j将创建很多的空记录，直到它到达你的ID。

因此，要么从0开始创建node-id，要将id存储为普通节点属性。或者您根本不提供node-id，只能通过他们的＆＃34; business-id-value＆＃34;

查找节点

i:id    id:long    l:label
0    315041100    Person
1    201215100    Person
2    315041200    Person

start:id    end:id    type    relart
0    1    HAS_RELATION    30006
2    0    HAS_RELATION    30006

或者您必须配置和使用索引：

id:long:people    l:label
315041100    Person
201215100    Person
315041200    Person

id:long:people    id:long:people    type    relart
0    1    HAS_RELATION    30006
2    0    HAS_RELATION    30006

HTH Michael

或者你也可以编写一个小的java或groovy程序来导入你的数据，如果使用批处理导入程序处理这些ID太棘手了。请参阅：http://jexp.de/blog/2014/10/flexible-neo4j-batch-import-with-groovy/

neo4j batchimporter很慢，ID很大

2 个答案: