我尝试将csv数据加载到嵌入式neo4j数据库(v2.1.7,在Windows上)。 csv文件有1,000,000行(1百万)。数据模型也很简单。如下:
"num1","num2","datatime"
"13931345724","18409958023","2014-12-31 12:00:00"
"13931345724","13710622859","2014-12-31 12:00:00"
"13931345724","18919875049","2014-12-31 12:00:00"
"13931345724","13460873081","2014-12-31 12:00:00"
...
USING PERIODIC COMMIT 5000
LOAD CSV FROM 'file:C:/tmpFiles/calls100w.csv' AS line FIELDTERMINATOR ','
WITH line
MERGE (n0:Phone:_Phone {phoneNumber : line[0]})
MERGE (n1:Phone:_Phone {phoneNumber : line[1]})
MERGE (n0)-[:CALL{callAt: line[2]}]->(n1)
这花了很长时间才得到这个例外:
java.lang.OutOfMemoryError: GC overhead limit exceeded
我尝试在数据库位置文件夹中添加 neo4j-wrapper.conf 文件。 但它似乎没有效果。
wrapper.java.additional.1=-XX:-UseConcMarkSweepGC
wrapper.java.additional.1=-Xloggc:c:/neo4jdb/log/neo4j-gc.log
wrapper.java.initmemory=4096
wrapper.java.maxmemory=4096
这是在messages.log文件中显示的内容
2015-06-09 09:55:00.513+0000 INFO [org.neo4j]: System memory information:
Total Physical memory: 7.70 GB
Free Physical memory: 1.99 GB
Committed virtual memory: 361.05 MB
Total swap space: 14.48 GB
Free swap space: 5.76 GB
2015-06-09 09:55:00.519+0000 INFO [org.neo4j]: JVM memory information:
Free memory: 135.42 MB
Total memory: 148.94 MB
Max memory: 1.71 GB
Garbage Collector: PS Scavenge: [PS Eden Space, PS Survivor Space]
Garbage Collector: PS MarkSweep: [PS Eden Space, PS Survivor Space, PS Old Gen, PS Perm Gen]
Memory Pool: Code Cache (Non-heap memory): committed=2.44 MB, used=833.00 kB, max=48.00 MB, threshold=0.00 B
Memory Pool: PS Eden Space (Heap memory): committed=61.63 MB, used=7.28 MB, max=647.06 MB, threshold=?
Memory Pool: PS Survivor Space (Heap memory): committed=5.13 MB, used=5.12 MB, max=5.13 MB, threshold=?
Memory Pool: PS Old Gen (Heap memory): committed=82.19 MB, used=1.12 MB, max=1.28 GB, threshold=0.00 B
Memory Pool: PS Perm Gen (Non-heap memory): committed=20.75 MB, used=12.78 MB, max=82.00 MB, threshold=0.00 B
在Windows上,数据库位置文件夹中没有conf /文件夹,所以我创建了一个并将neo4j-wrapper.conf放入其中。将conf文件放在正确的位置?
C:\NEO4JDB
| index.db
| messages.log
| neo4j.properties
| neostore
| neostore.id
| neostore.labeltokenstore.db
| ....
+---conf
| neo4j-wrapper.conf
+---index
| lucene-store.db
| lucene.log.active
| ...
\---schema
+---...
答案 0 :(得分:2)
将其拆分为2个进口:
你遇到了cypher创建Eager pipe以断言正确分离的问题,这会导致所有CSV行被急切地拉入,使定期提交无效,请参阅:
:Phone(phoneNumber)
我假设你在USING PERIODIC COMMIT 5000
LOAD CSV FROM 'file:C:/tmpFiles/calls100w.csv' AS line FIELDTERMINATOR ','
WITH line
MERGE (n0:Phone:_Phone {phoneNumber : line[0]})
MERGE (n1:Phone:_Phone {phoneNumber : line[1]});
USING PERIODIC COMMIT 5000
LOAD CSV FROM 'file:C:/tmpFiles/calls100w.csv' AS line FIELDTERMINATOR ','
WITH line
MATCH (n0:Phone:_Phone {phoneNumber : line[0]})
MATCH (n1:Phone:_Phone {phoneNumber : line[1]})
MERGE (n0)-[:CALL{callAt: line[2]}]->(n1);
上有一个索引/约束。
如果将查询分成两部分,它将起作用:
File root = new File("./build/classes");
URLClassLoader classLoader = URLClassLoader.newInstance(new URL[] {
root.toURI().toURL()
});
Class<?> loadedClass = Class.forName("events.Source", true,classLoader);
// JSON --> Java "Create the actual type"
Type listType = new DefaultParameterizedType(ArrayList.class, loadedClass);
Gson gson = new Gson();
ArrayList<loadedClass> resourcesList = gson.fromJson(jsonString, listType);
答案 1 :(得分:0)
您可能希望尝试将导入拆分为节点导入和关系导入。假设您的节点和关系文件已去除重复数据删除,而不是使用MERGE
语句,您可以使用CREATE
。
因此,例如,制作一个&#34; num1.csv&#34;只包含&#34; call100w.csv&#34;的第一列(num1)的文件文件并删除所有重复项。制作一个&#34; num2.csv&#34;只包含第二列&#34; calls100w.csv&#34;并删除重复项。
然后加载节点csv文件:
USING PERIODIC COMMIT 5000
LOAD CSV FROM 'file:C:/tmpFiles/num1.csv' AS line FIELDTERMINATOR ','
WITH line
CREATE (n0:Phone:_Phone {phoneNumber : line[0]})
AND
USING PERIODIC COMMIT 5000
LOAD CSV FROM 'file:C:/tmpFiles/num2.csv' AS line FIELDTERMINATOR ','
WITH line
CREATE (n1:Phone:_Phone {phoneNumber : line[0]})
然后创建索引:
CREATE INDEX ON :Phone(phoneNumber)
现在加载原始csv以创建关系:
USING PERIODIC COMMIT 5000
LOAD CSV FROM 'file:C:/tmpFiles/calls100w.csv' AS line FIELDTERMINATOR ','
WITH line
MATCH (n0:Phone:_Phone {phoneNumber : line[0]})
MATCH (n1:Phone:_Phone {phoneNumber : line[1]})
MERGE (n0)-[:CALL{callAt: line[2]}]->(n1)