Neo4j导入工具 - OutOfMemory错误:GC开销限制超出

时间:2016-02-15 01:22:27

标签: java csv import neo4j garbage-collection

我正在使用neo4j-import工具(Windows)导入约100万个节点,其中约有2000万个关系,所有这些节点都应该是唯一的。这个过程顺利进行,直到它进入"关系计数"任务,它一直加载到20M(看似所有的关系),然后它挂起一段时间(30分钟-1小时),最终返回" java.lang.OutOfMemoryError:超出GC开销限制&#34 ;。

我之前成功加载了大型图形数据库(39M节点,21M关系),所以我不确定问题是什么。是因为图形数据库与我加载的先前数据库相比更密集地连接了吗?

或者,是否有内存泄漏?在我的任务管理器中,Java Platform SE二进制进程需要越来越大的内存(16GB内存中最多12-13GB),因为导入负载,特别是到最后。这似乎非常大,特别是因为39M节点/ 21M关系图数据库能够相对快速地使用导入工具成功导入(在关系计数时没有挂起)。

有什么可能出错的想法?提前谢谢!

如果查看我的节点/关系文件有帮助,这里有一个指向它们的链接: https://drive.google.com/open?id=0Bw7N-SlJA3ZCei0ycEhoa2YwNUU

这是neo4j shell输出:

C:Users\Username\Documents\Neo4j>neo4jImport -into graphDB1.graphdb --nodes D:\concept.csv --relationships D:\predicate.csv --stacktrace --idtype integer
WARNING! This batch script has been deprecated. Please use the provided PowerShell scripts instead: http://neo4j.com/docs/stable/powershell.html
The system cannot find the path specified.
Importing the contents of these files into graphDB1.graphdb:
Nodes:
  D:\concept.csv
Relationships:
      D:\predicate.csv

Available memory:
  Free machine memory: 13.50 GB
  Max heap memory : 12.75 GB

    Nodes
[>:|PR|NOD|*LABEL SCAN---------------------------------|v:6.79 MB/s----------------------------]  1M
Done in 40s 562ms
Prepare node index
[*DETECT:20.37 MB------------------------------------------------------------------------------]  1M
Done in 802ms
Calculate dense nodes
[*>:59.38 MB/s----------------------------------|PREPARE(3)====================================] 20M
Done in 12s 566ms
Relationships
[>:2.01 |PREPARE-----------|P|RELATIONSHI|*v:4.05 MB/s-----------------------------------------] 20M
Done in 6m 3s 655ms
Node --> Relationship
[>:3.19 MB/s--------------------------|L|*v:2.39 MB/s------------------------------------------]  1M
Done in 8s 421ms
Relationship --> Relationship
[*>:6.82 MB/s--------------------------------------|LINK-----------|v:6.82 MB/s----------------] 20M
Done in 1m 36s 849ms
Node counts
[*COUNT:91.55 MB-------------------------------------------------------------------------------]  1M
Done in 3m 35s 21ms
Relationship counts
[*>:8.62 MB/s-----------------------------------------------------------|COUNT-----------------] 20MException in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.Arrays.copyOf(Unknown Source)
        at java.util.ArrayList.toArray(Unknown Source)
        at java.util.ArrayList.<init>(Unknown Source)
        at org.neo4j.unsafe.impl.batchimport.stats.StepStats.<init>(StepStats.java:39)
        at org.neo4j.unsafe.impl.batchimport.staging.AbstractStep.stats(AbstractStep.java:220)
        at org.neo4j.unsafe.impl.batchimport.staging.StageExecution$1.compare(StageExecution.java:123)
        at org.neo4j.unsafe.impl.batchimport.staging.StageExecution$1.compare(StageExecution.java:118)
        at java.util.TimSort.countRunAndMakeAscending(Unknown Source)
        at java.util.TimSort.sort(Unknown Source)
        at java.util.TimSort.sort(Unknown Source)
        at java.util.Arrays.sort(Unknown Source)
        at java.util.Collections.sort(Unknown Source)
        at org.neo4j.unsafe.impl.batchimport.staging.StageExecution.stepsOrderedBy(StageExecution.java:117)
        at org.neo4j.unsafe.impl.batchimport.staging.DynamicProcessorAssigner.assignProcessorsToPotentialBottleNeck(DynamicProcessorAssigner.java:94)
        at org.neo4j.unsafe.impl.batchimport.staging.DynamicProcessorAssigner.check(DynamicProcessorAssigner.java:81)
        at org.neo4j.unsafe.impl.batchimport.staging.MultiExecutionMonitor.check(MultiExecutionMonitor.java:106)
        at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.supervise(ExecutionSupervisor.java:65)
        at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisors.superviseExecution(ExecutionSupervisors.java:80)
        at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.executeStages(ParallelBatchImporter.java:224)
        at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.doImport(ParallelBatchImporter.java:185)
        at org.neo4j.tooling.ImportTool.main(ImportTool.java:363)
        at org.neo4j.tooling.ImportTool.main(ImportTool.java:279)

更新1:

以下是导入在关系计数处挂起的时刻的线程转储:

2016-02-17 08:28:12
Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.80-b11 mixed mode):

"MuninnPageCache[1]-FlushTask" daemon prio=6 tid=0x0000000026855800 nid=0xfe0 waiting on condition [0x00000000288fe000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000004c0189810> (a org.neo4j.io.pagecache.impl.muninn.MuninnPageCache)
        at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
        at org.neo4j.io.pagecache.impl.muninn.MuninnPageCache.continuouslyFlushPages(MuninnPageCache.java:909)
        at org.neo4j.io.pagecache.impl.muninn.FlushTask.run(FlushTask.java:36)
        at org.neo4j.io.pagecache.impl.muninn.BackgroundTask.run(BackgroundTask.java:45)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"MuninnPageCache[1]-EvictionTask" daemon prio=6 tid=0x0000000026904000 nid=0x3bd4 runnable [0x00000000287fe000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000004c0189810> (a org.neo4j.io.pagecache.impl.muninn.MuninnPageCache)
        at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
        at org.neo4j.io.pagecache.impl.muninn.MuninnPageCache.parkEvictor(MuninnPageCache.java:697)
        at org.neo4j.io.pagecache.impl.muninn.MuninnPageCache.parkUntilEvictionRequired(MuninnPageCache.java:751)
        at org.neo4j.io.pagecache.impl.muninn.MuninnPageCache.continuouslySweepPages(MuninnPageCache.java:732)
        at org.neo4j.io.pagecache.impl.muninn.EvictionTask.run(EvictionTask.java:39)
        at org.neo4j.io.pagecache.impl.muninn.BackgroundTask.run(BackgroundTask.java:45)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"Service Thread" daemon prio=6 tid=0x0000000024ee8000 nid=0x301c runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" daemon prio=10 tid=0x0000000024ee6000 nid=0x3060 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" daemon prio=10 tid=0x0000000024ee2800 nid=0x2198 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Attach Listener" daemon prio=10 tid=0x0000000024ee2000 nid=0x1ae4 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x0000000024ee1000 nid=0x135c waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=8 tid=0x0000000024ed9000 nid=0x3480 in Object.wait() [0x00000000278ff000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000004c000d4b0> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(Unknown Source)
        - locked <0x00000004c000d4b0> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(Unknown Source)
        at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)

"Reference Handler" daemon prio=10 tid=0x0000000024ed8000 nid=0x1ae8 in Object.wait() [0x00000000277ff000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000004c000d300> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:503)
        at java.lang.ref.Reference$ReferenceHandler.run(Unknown Source)
        - locked <0x00000004c000d300> (a java.lang.ref.Reference$Lock)

"main" prio=6 tid=0x00000000023c2800 nid=0x2e7c waiting on condition [0x00000000023bf000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.neo4j.io.fs.FileUtils.waitAndThenTriggerGC(FileUtils.java:253)
        at org.neo4j.io.fs.FileUtils.deleteFile(FileUtils.java:110)
        at org.neo4j.io.fs.DefaultFileSystemAbstraction.deleteFile(DefaultFileSystemAbstraction.java:127)
        at org.neo4j.kernel.impl.storemigration.FileOperation$3.perform(FileOperation.java:93)
        at org.neo4j.kernel.impl.storemigration.StoreFile.fileOperation(StoreFile.java:267)
        at org.neo4j.tooling.ImportTool.main(ImportTool.java:389)
        at org.neo4j.tooling.ImportTool.main(ImportTool.java:279)

"VM Thread" prio=10 tid=0x0000000024ed1800 nid=0x3058 runnable

"GC task thread#0 (ParallelGC)" prio=6 tid=0x00000000023d7000 nid=0x313c runnable

"GC task thread#1 (ParallelGC)" prio=6 tid=0x00000000023d9000 nid=0x3144 runnable

"GC task thread#2 (ParallelGC)" prio=6 tid=0x00000000023da800 nid=0x974 runnable

"GC task thread#3 (ParallelGC)" prio=6 tid=0x00000000023dc000 nid=0x3a3c runnable

"GC task thread#4 (ParallelGC)" prio=6 tid=0x00000000023de800 nid=0x3684 runnable

"GC task thread#5 (ParallelGC)" prio=6 tid=0x00000000023e1000 nid=0x35b8 runnable

"GC task thread#6 (ParallelGC)" prio=6 tid=0x00000000023e4000 nid=0x3950 runnable

"GC task thread#7 (ParallelGC)" prio=6 tid=0x00000000023e5800 nid=0x318c runnable

"GC task thread#8 (ParallelGC)" prio=6 tid=0x00000000023e8800 nid=0x30b8 runnable

"GC task thread#9 (ParallelGC)" prio=6 tid=0x00000000023e9800 nid=0x32dc runnable

"VM Periodic Task Thread" prio=10 tid=0x0000000024eed800 nid=0x3710 waiting on condition

JNI global references: 377

Heap
 PSYoungGen      total 2071552K, used 0K [0x0000000780000000, 0x0000000800000000, 0x0000000800000000)
  eden space 2043904K, 0% used [0x0000000780000000,0x0000000780000000,0x00000007fcc00000)
  from space 27648K, 0% used [0x00000007fe500000,0x00000007fe500000,0x0000000800000000)
  to   space 25600K, 0% used [0x00000007fcc00000,0x00000007fcc00000,0x00000007fe500000)
 ParOldGen       total 11534336K, used 10982258K [0x00000004c0000000, 0x0000000780000000, 0x0000000780000000)
  object space 11534336K, 95% used [0x00000004c0000000,0x000000075e4dcb50,0x0000000780000000)
 PSPermGen       total 21504K, used 13521K [0x00000004bae00000, 0x00000004bc300000, 0x00000004c0000000)
  object space 21504K, 62% used [0x00000004bae00000,0x00000004bbb34588,0x00000004bc300000)

2016-02-17 08:28:20

1 个答案:

答案 0 :(得分:0)

在这么小的数据集上这很奇怪。您希望在此数据集中有多少独特的关系和标签?当它发生时,你还能以某种方式提供一个线程转储吗?

编辑:问题是包含属性值的列被用作LABEL。这错误地产生了大量的标签,计数也没有随之扩展。