Neo4J Traversal耗尽内存

时间:2014-04-14 20:04:04

标签: java neo4j

我有一个包含大约1.2亿个节点的neo4j数据库。我正在使用遍历框架遍历我的图并计算某些节点的出现。这就像一个魅力。不幸的是,当我在整个数据集上运行代码时,我的内存不足。

我已经为Java VM分配了4gb,我想我提交了我的交易(使用try-with-resources语句中的tx.success),但我仍然非常快地填充我的堆。

您可以在下面找到我的代码: 首先,我生成大约40个版本(这些是根节点)。然后,对于其中的每一个,我寻找所有相邻的子节点。对于这些子节点(文件)中的每一个,我检查整个子树是否出现某个节点。

我的理解是使用

try(Transaction tx){
 }

自动关闭了我的交易,但我的堆仍然满了。这使我的查询从版本的第二个或第三个直通运行缓慢,并最终崩溃。我误会了什么吗?或者我还能做些什么吗?

    Collection<Node> versions;
    Collection<Node> files;
    Collection<Node> nodes;
    try ( Transaction ignored = db.beginTx() )
    {
        versions = IteratorUtil.asCollection(db.traversalDescription().breadthFirst().relationships(ProjectRelations.HASVERSION, Direction.OUTGOING).evaluator(Evaluators.toDepth(1)).evaluator(Evaluators.excludeStartPosition()).traverse(db.getNodeById(0)).nodes());
        ignored.success();
    }

    for(Node v : versions){
        int fors = 0;
        test = 0;

        try( Transaction tx = db.beginTx()){

            files = IteratorUtil.asCollection(db.traversalDescription().breadthFirst().relationships(ProjectRelations.FILES, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).traverse(v).nodes());

            tx.success();
        }

        for( Node f : files ) {

            try (Transaction t = db.beginTx()){
                int i = 0;
                for(Node node : db.traversalDescription().depthFirst().relationships(RelTypes.ISPARENTOF, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).evaluator(e).traverse(f).nodes()){
                     //do some stuff
                }
                t.success();
            }   
        }

        files.clear();


    }
    versions.clear();

更新

我用迭代器替换了所有内容,如:

try( 
                Transaction tx = db.beginTx(); 
                ResourceIterator<Node> files = db.traversalDescription().breadthFirst().relationships(ProjectRelations.FILES, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).traverse(v).nodes().iterator();
            ){



            int idx = 0;
            forloops = 0;
            long start = System.nanoTime();

            while( files.hasNext() ) {

                Node f = files.next();


                try (Transaction t = db.beginTx();
                        ResourceIterator<Node> blah = db.traversalDescription().depthFirst().relationships(RelTypes.ISPARENTOF, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).evaluator(e).traverse(f).nodes().iterator();
                        ){
                    int i = 0;

                    while(blah.hasNext()){
                        Node tempNode = blah.next();

                    }
                    blah.close();

                }

            }
            files.close();
        }


    }

问题是,在我耗尽迭代器或关闭()它之前,事务会将所有内容保留在内存中

编辑2:

我使用迭代器来处理所有事情,使用深度优先遍历。我还将可用堆内存从4 GB更改为1024mb。现在似乎正在运行(虽然我不确定它是否会完全完成),尽管非常缓慢。它运行高达约980mb,但没有超过该阈值(尚未)。由于我的堆在整个时间内都是完整的,所以确实会有一个巨大的减速。有什么想改进吗?或者这是我最好的选择?

    try(Transaction tx = db.beginTx()){
    versions = IteratorUtil.asCollection(db
                .traversalDescription()
                .depthFirst()
                .relationships(ProjectRelations.HASVERSION,
                        Direction.OUTGOING)
                .evaluator(Evaluators.toDepth(1))
                .evaluator(Evaluators.excludeStartPosition())
                .traverse(root));

    }
    int mb = 1024 * 1024;
    Runtime runtime = Runtime.getRuntime();

    ResourceIterator<Node> files = null;

    try(Transaction tx = db.beginTx()){
        int idx = 0;
        for(Relationship rel : root.getRelationships(ProjectRelations.HASVERSION, Direction.OUTGOING)){
            idx++;
            System.out.println(idx);
            Node v = rel.getEndNode();
            files = db.traversalDescription().depthFirst().relationships(ProjectRelations.FILES, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).uniqueness(Uniqueness.NONE).traverse(v).nodes().iterator();
            long start = System.nanoTime();
            while(files.hasNext()){

                Node f = files.next();

                ResourceIterator<Node> node = db.traversalDescription().depthFirst().relationships(RelTypes.ISPARENTOF, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).evaluator(e).traverse(f).nodes().iterator();
                while(node.hasNext()){
                    node.next();
                }

            }
            System.out.println("Used Memory:"
                    + (runtime.totalMemory() - runtime.freeMemory()) / mb);
            System.out
                    .println("Total Memory:" + runtime.totalMemory() / mb);


            files.close();
        }

    }

    db.shutdown();

引发的异常:

Exception in thread "GC-Monitor" Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(Unknown Source)
at java.lang.AbstractStringBuilder.append(Unknown Source)
at java.lang.StringBuilder.append(Unknown Source)
at ch.qos.logback.core.pattern.FormattingConverter.write(FormattingConverter.java:40)
at ch.qos.logback.core.pattern.PatternLayoutBase.writeLoopOnConverters(PatternLayoutBase.java:119)
at ch.qos.logback.classic.PatternLayout.doLayout(PatternLayout.java:168)
at ch.qos.logback.classic.PatternLayout.doLayout(PatternLayout.java:59)
at ch.qos.logback.core.encoder.LayoutWrappingEncoder.doEncode(LayoutWrappingEncoder.java:134)
at ch.qos.logback.core.OutputStreamAppender.writeOut(OutputStreamAppender.java:188)
at ch.qos.logback.core.FileAppender.writeOut(FileAppender.java:206)
at ch.qos.logback.core.OutputStreamAppender.subAppend(OutputStreamAppender.java:212)
at ch.qos.logback.core.OutputStreamAppender.append(OutputStreamAppender.java:103)
at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:88)
at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:48)
at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:272)
at ch.qos.logback.classic.Logger.callAppenders(Logger.java:259)
at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:441)
at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:395)
at ch.qos.logback.classic.Logger.warn(Logger.java:708)
at org.neo4j.kernel.logging.LogbackService$Slf4jToStringLoggerAdapter.warn(LogbackService.java:240)
at org.neo4j.kernel.impl.cache.MeasureDoNothing.run(MeasureDoNothing.java:84)

java.lang.OutOfMemoryError: Java heap space
at java.util.ArrayList.<init>(Unknown Source)
at org.neo4j.kernel.impl.core.RelationshipLoader.getMoreRelationships(RelationshipLoader.java:55)
at org.neo4j.kernel.impl.core.NodeManager.getMoreRelationships(NodeManager.java:779)
at org.neo4j.kernel.impl.core.NodeImpl.loadMoreRelationshipsFromNodeManager(NodeImpl.java:577)
at org.neo4j.kernel.impl.core.NodeImpl.getMoreRelationships(NodeImpl.java:466)
at org.neo4j.kernel.impl.core.NodeImpl.loadInitialRelationships(NodeImpl.java:394)
at org.neo4j.kernel.impl.core.NodeImpl.ensureRelationshipMapNotNull(NodeImpl.java:372)
at org.neo4j.kernel.impl.core.NodeImpl.getAllRelationshipsOfType(NodeImpl.java:219)
at org.neo4j.kernel.impl.core.NodeImpl.getRelationships(NodeImpl.java:325)
at org.neo4j.kernel.impl.core.NodeProxy.getRelationships(NodeProxy.java:154)
at org.neo4j.kernel.StandardExpander$RegularExpander.doExpand(StandardExpander.java:583)
at org.neo4j.kernel.StandardExpander$RelationshipExpansion.iterator(StandardExpander.java:195)
at org.neo4j.kernel.impl.traversal.TraversalBranchImpl.expandRelationshipsWithoutChecks(TraversalBranchImpl.java:115)
at org.neo4j.kernel.impl.traversal.TraversalBranchImpl.expandRelationships(TraversalBranchImpl.java:104)
at org.neo4j.kernel.impl.traversal.TraversalBranchImpl.initialize(TraversalBranchImpl.java:131)
at org.neo4j.kernel.impl.traversal.TraversalBranchImpl.next(TraversalBranchImpl.java:151)
at org.neo4j.graphdb.traversal.PreorderDepthFirstSelector.next(PreorderDepthFirstSelector.java:49)
at org.neo4j.kernel.impl.traversal.MonoDirectionalTraverserIterator.fetchNextOrNull(MonoDirectionalTraverserIterator.java:68)
at org.neo4j.kernel.impl.traversal.MonoDirectionalTraverserIterator.fetchNextOrNull(MonoDirectionalTraverserIterator.java:35)
at org.neo4j.helpers.collection.PrefetchingIterator.hasNext(PrefetchingIterator.java:55)
at org.neo4j.kernel.impl.traversal.DefaultTraverser$ResourcePathIterableWrapper$1.fetchNextOrNull(DefaultTraverser.java:140)
at org.neo4j.helpers.collection.PrefetchingIterator.hasNext(PrefetchingIterator.java:55)
at main.QueryExecutor.main(QueryExecutor.java:173)

2 个答案:

答案 0 :(得分:1)

当您使用IteratorUtil.asCollection()执行第二次遍历时,看起来您正急切地消耗整个迭代器。我不确定在这种情况下会产生多少节点,但如果它们很多(即数百万),很可能会导致内存不足问题。

答案 1 :(得分:0)

我通过将cache_type选项设置为none来解决我的问题。它不会耗尽内存并在大约一小时内完成。