在neo4j中加载csv得到java.lang.OutOfMemoryError:GC开销限制超出了异常

时间:2015-06-10 06:57:21

标签: neo4j cypher

我尝试将csv数据加载到嵌入式neo4j数据库(v2.1.7,在Windows上)。 csv文件有1,000,000行(1百万)。数据模型也很简单。如下:

csv数据

"num1","num2","datatime"
"13931345724","18409958023","2014-12-31 12:00:00"
"13931345724","13710622859","2014-12-31 12:00:00"
"13931345724","18919875049","2014-12-31 12:00:00"
"13931345724","13460873081","2014-12-31 12:00:00"
...

正在加载cypher sql

USING PERIODIC COMMIT 5000 
LOAD CSV FROM 'file:C:/tmpFiles/calls100w.csv' AS line FIELDTERMINATOR ','
 WITH line 
MERGE (n0:Phone:_Phone {phoneNumber : line[0]}) 
MERGE (n1:Phone:_Phone {phoneNumber : line[1]}) 
MERGE (n0)-[:CALL{callAt: line[2]}]->(n1) 

这花了很长时间才得到这个例外:

java.lang.OutOfMemoryError: GC overhead limit exceeded

我尝试在数据库位置文件夹中添加 neo4j-wrapper.conf 文件。 但它似乎没有效果。

的Neo4j-wrapper.conf

wrapper.java.additional.1=-XX:-UseConcMarkSweepGC
wrapper.java.additional.1=-Xloggc:c:/neo4jdb/log/neo4j-gc.log
wrapper.java.initmemory=4096
wrapper.java.maxmemory=4096

这是在messages.log文件中显示的内容

2015-06-09 09:55:00.513+0000 INFO  [org.neo4j]: System memory information:
    Total Physical memory: 7.70 GB
    Free Physical memory: 1.99 GB
    Committed virtual memory: 361.05 MB
    Total swap space: 14.48 GB
    Free swap space: 5.76 GB
2015-06-09 09:55:00.519+0000 INFO  [org.neo4j]: JVM memory information:
    Free  memory: 135.42 MB
    Total memory: 148.94 MB
    Max   memory: 1.71 GB
    Garbage Collector: PS Scavenge: [PS Eden Space, PS Survivor Space]
    Garbage Collector: PS MarkSweep: [PS Eden Space, PS Survivor Space, PS Old Gen, PS Perm Gen]
    Memory Pool: Code Cache (Non-heap memory): committed=2.44 MB, used=833.00 kB, max=48.00 MB, threshold=0.00 B
    Memory Pool: PS Eden Space (Heap memory): committed=61.63 MB, used=7.28 MB, max=647.06 MB, threshold=?
    Memory Pool: PS Survivor Space (Heap memory): committed=5.13 MB, used=5.12 MB, max=5.13 MB, threshold=?
    Memory Pool: PS Old Gen (Heap memory): committed=82.19 MB, used=1.12 MB, max=1.28 GB, threshold=0.00 B
    Memory Pool: PS Perm Gen (Non-heap memory): committed=20.75 MB, used=12.78 MB, max=82.00 MB, threshold=0.00 B

在Windows上,数据库位置文件夹中没有conf /文件夹,所以我创建了一个并将neo4j-wrapper.conf放入其中。将conf文件放在正确的位置?

数据库位置文件夹

C:\NEO4JDB
|   index.db
|   messages.log
|   neo4j.properties
|   neostore
|   neostore.id
|   neostore.labeltokenstore.db
|   ....
+---conf
|       neo4j-wrapper.conf
+---index
|       lucene-store.db
|       lucene.log.active
|       ...
\---schema
    +---...

2 个答案:

答案 0 :(得分:2)

将其拆分为2个进口:

你遇到了cypher创建Eager pipe以断言正确分离的问题,这会导致所有CSV行被急切地拉入,使定期提交无效,请参阅:

:Phone(phoneNumber)

我假设你在USING PERIODIC COMMIT 5000 LOAD CSV FROM 'file:C:/tmpFiles/calls100w.csv' AS line FIELDTERMINATOR ',' WITH line MERGE (n0:Phone:_Phone {phoneNumber : line[0]}) MERGE (n1:Phone:_Phone {phoneNumber : line[1]}); USING PERIODIC COMMIT 5000 LOAD CSV FROM 'file:C:/tmpFiles/calls100w.csv' AS line FIELDTERMINATOR ',' WITH line MATCH (n0:Phone:_Phone {phoneNumber : line[0]}) MATCH (n1:Phone:_Phone {phoneNumber : line[1]}) MERGE (n0)-[:CALL{callAt: line[2]}]->(n1); 上有一个索引/约束。 如果将查询分成两部分,它将起作用:

File root = new File("./build/classes");
URLClassLoader classLoader = URLClassLoader.newInstance(new URL[] {
   root.toURI().toURL()
});
Class<?> loadedClass = Class.forName("events.Source", true,classLoader);

// JSON --> Java "Create the actual type"
Type listType = new DefaultParameterizedType(ArrayList.class, loadedClass);

Gson gson = new Gson();
ArrayList<loadedClass> resourcesList = gson.fromJson(jsonString, listType);

答案 1 :(得分:0)

您可能希望尝试将导入拆分为节点导入和关系导入。假设您的节点和关系文件已去除重复数据删除,而不是使用MERGE语句,您可以使用CREATE

因此,例如,制作一个&#34; num1.csv&#34;只包含&#34; call100w.csv&#34;的第一列(num1)的文件文件并删除所有重复项。制作一个&#34; num2.csv&#34;只包含第二列&#34; calls100w.csv&#34;并删除重复项。

然后加载节点csv文件:

USING PERIODIC COMMIT 5000 
LOAD CSV FROM 'file:C:/tmpFiles/num1.csv' AS line FIELDTERMINATOR ','
WITH line 
CREATE (n0:Phone:_Phone {phoneNumber : line[0]})

AND

USING PERIODIC COMMIT 5000 
LOAD CSV FROM 'file:C:/tmpFiles/num2.csv' AS line FIELDTERMINATOR ','
WITH line 
CREATE (n1:Phone:_Phone {phoneNumber : line[0]}) 

然后创建索引:

CREATE INDEX ON :Phone(phoneNumber)

现在加载原始csv以创建关系:

USING PERIODIC COMMIT 5000 
LOAD CSV FROM 'file:C:/tmpFiles/calls100w.csv' AS line FIELDTERMINATOR ','
WITH line 
MATCH (n0:Phone:_Phone {phoneNumber : line[0]}) 
MATCH (n1:Phone:_Phone {phoneNumber : line[1]}) 
MERGE (n0)-[:CALL{callAt: line[2]}]->(n1)