我有一个包含大约1.2亿个节点的neo4j数据库。我正在使用遍历框架遍历我的图并计算某些节点的出现。这就像一个魅力。不幸的是,当我在整个数据集上运行代码时,我的内存不足。
我已经为Java VM分配了4gb,我想我提交了我的交易(使用try-with-resources语句中的tx.success),但我仍然非常快地填充我的堆。
您可以在下面找到我的代码: 首先,我生成大约40个版本(这些是根节点)。然后,对于其中的每一个,我寻找所有相邻的子节点。对于这些子节点(文件)中的每一个,我检查整个子树是否出现某个节点。
我的理解是使用
try(Transaction tx){
}
自动关闭了我的交易,但我的堆仍然满了。这使我的查询从版本的第二个或第三个直通运行缓慢,并最终崩溃。我误会了什么吗?或者我还能做些什么吗?
Collection<Node> versions;
Collection<Node> files;
Collection<Node> nodes;
try ( Transaction ignored = db.beginTx() )
{
versions = IteratorUtil.asCollection(db.traversalDescription().breadthFirst().relationships(ProjectRelations.HASVERSION, Direction.OUTGOING).evaluator(Evaluators.toDepth(1)).evaluator(Evaluators.excludeStartPosition()).traverse(db.getNodeById(0)).nodes());
ignored.success();
}
for(Node v : versions){
int fors = 0;
test = 0;
try( Transaction tx = db.beginTx()){
files = IteratorUtil.asCollection(db.traversalDescription().breadthFirst().relationships(ProjectRelations.FILES, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).traverse(v).nodes());
tx.success();
}
for( Node f : files ) {
try (Transaction t = db.beginTx()){
int i = 0;
for(Node node : db.traversalDescription().depthFirst().relationships(RelTypes.ISPARENTOF, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).evaluator(e).traverse(f).nodes()){
//do some stuff
}
t.success();
}
}
files.clear();
}
versions.clear();
更新
我用迭代器替换了所有内容,如:
try(
Transaction tx = db.beginTx();
ResourceIterator<Node> files = db.traversalDescription().breadthFirst().relationships(ProjectRelations.FILES, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).traverse(v).nodes().iterator();
){
int idx = 0;
forloops = 0;
long start = System.nanoTime();
while( files.hasNext() ) {
Node f = files.next();
try (Transaction t = db.beginTx();
ResourceIterator<Node> blah = db.traversalDescription().depthFirst().relationships(RelTypes.ISPARENTOF, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).evaluator(e).traverse(f).nodes().iterator();
){
int i = 0;
while(blah.hasNext()){
Node tempNode = blah.next();
}
blah.close();
}
}
files.close();
}
}
问题是,在我耗尽迭代器或关闭()它之前,事务会将所有内容保留在内存中
编辑2:
我使用迭代器来处理所有事情,使用深度优先遍历。我还将可用堆内存从4 GB更改为1024mb。现在似乎正在运行(虽然我不确定它是否会完全完成),尽管非常缓慢。它运行高达约980mb,但没有超过该阈值(尚未)。由于我的堆在整个时间内都是完整的,所以确实会有一个巨大的减速。有什么想改进吗?或者这是我最好的选择?
try(Transaction tx = db.beginTx()){
versions = IteratorUtil.asCollection(db
.traversalDescription()
.depthFirst()
.relationships(ProjectRelations.HASVERSION,
Direction.OUTGOING)
.evaluator(Evaluators.toDepth(1))
.evaluator(Evaluators.excludeStartPosition())
.traverse(root));
}
int mb = 1024 * 1024;
Runtime runtime = Runtime.getRuntime();
ResourceIterator<Node> files = null;
try(Transaction tx = db.beginTx()){
int idx = 0;
for(Relationship rel : root.getRelationships(ProjectRelations.HASVERSION, Direction.OUTGOING)){
idx++;
System.out.println(idx);
Node v = rel.getEndNode();
files = db.traversalDescription().depthFirst().relationships(ProjectRelations.FILES, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).uniqueness(Uniqueness.NONE).traverse(v).nodes().iterator();
long start = System.nanoTime();
while(files.hasNext()){
Node f = files.next();
ResourceIterator<Node> node = db.traversalDescription().depthFirst().relationships(RelTypes.ISPARENTOF, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).evaluator(e).traverse(f).nodes().iterator();
while(node.hasNext()){
node.next();
}
}
System.out.println("Used Memory:"
+ (runtime.totalMemory() - runtime.freeMemory()) / mb);
System.out
.println("Total Memory:" + runtime.totalMemory() / mb);
files.close();
}
}
db.shutdown();
引发的异常:
Exception in thread "GC-Monitor" Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(Unknown Source)
at java.lang.AbstractStringBuilder.append(Unknown Source)
at java.lang.StringBuilder.append(Unknown Source)
at ch.qos.logback.core.pattern.FormattingConverter.write(FormattingConverter.java:40)
at ch.qos.logback.core.pattern.PatternLayoutBase.writeLoopOnConverters(PatternLayoutBase.java:119)
at ch.qos.logback.classic.PatternLayout.doLayout(PatternLayout.java:168)
at ch.qos.logback.classic.PatternLayout.doLayout(PatternLayout.java:59)
at ch.qos.logback.core.encoder.LayoutWrappingEncoder.doEncode(LayoutWrappingEncoder.java:134)
at ch.qos.logback.core.OutputStreamAppender.writeOut(OutputStreamAppender.java:188)
at ch.qos.logback.core.FileAppender.writeOut(FileAppender.java:206)
at ch.qos.logback.core.OutputStreamAppender.subAppend(OutputStreamAppender.java:212)
at ch.qos.logback.core.OutputStreamAppender.append(OutputStreamAppender.java:103)
at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:88)
at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:48)
at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:272)
at ch.qos.logback.classic.Logger.callAppenders(Logger.java:259)
at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:441)
at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:395)
at ch.qos.logback.classic.Logger.warn(Logger.java:708)
at org.neo4j.kernel.logging.LogbackService$Slf4jToStringLoggerAdapter.warn(LogbackService.java:240)
at org.neo4j.kernel.impl.cache.MeasureDoNothing.run(MeasureDoNothing.java:84)
java.lang.OutOfMemoryError: Java heap space
at java.util.ArrayList.<init>(Unknown Source)
at org.neo4j.kernel.impl.core.RelationshipLoader.getMoreRelationships(RelationshipLoader.java:55)
at org.neo4j.kernel.impl.core.NodeManager.getMoreRelationships(NodeManager.java:779)
at org.neo4j.kernel.impl.core.NodeImpl.loadMoreRelationshipsFromNodeManager(NodeImpl.java:577)
at org.neo4j.kernel.impl.core.NodeImpl.getMoreRelationships(NodeImpl.java:466)
at org.neo4j.kernel.impl.core.NodeImpl.loadInitialRelationships(NodeImpl.java:394)
at org.neo4j.kernel.impl.core.NodeImpl.ensureRelationshipMapNotNull(NodeImpl.java:372)
at org.neo4j.kernel.impl.core.NodeImpl.getAllRelationshipsOfType(NodeImpl.java:219)
at org.neo4j.kernel.impl.core.NodeImpl.getRelationships(NodeImpl.java:325)
at org.neo4j.kernel.impl.core.NodeProxy.getRelationships(NodeProxy.java:154)
at org.neo4j.kernel.StandardExpander$RegularExpander.doExpand(StandardExpander.java:583)
at org.neo4j.kernel.StandardExpander$RelationshipExpansion.iterator(StandardExpander.java:195)
at org.neo4j.kernel.impl.traversal.TraversalBranchImpl.expandRelationshipsWithoutChecks(TraversalBranchImpl.java:115)
at org.neo4j.kernel.impl.traversal.TraversalBranchImpl.expandRelationships(TraversalBranchImpl.java:104)
at org.neo4j.kernel.impl.traversal.TraversalBranchImpl.initialize(TraversalBranchImpl.java:131)
at org.neo4j.kernel.impl.traversal.TraversalBranchImpl.next(TraversalBranchImpl.java:151)
at org.neo4j.graphdb.traversal.PreorderDepthFirstSelector.next(PreorderDepthFirstSelector.java:49)
at org.neo4j.kernel.impl.traversal.MonoDirectionalTraverserIterator.fetchNextOrNull(MonoDirectionalTraverserIterator.java:68)
at org.neo4j.kernel.impl.traversal.MonoDirectionalTraverserIterator.fetchNextOrNull(MonoDirectionalTraverserIterator.java:35)
at org.neo4j.helpers.collection.PrefetchingIterator.hasNext(PrefetchingIterator.java:55)
at org.neo4j.kernel.impl.traversal.DefaultTraverser$ResourcePathIterableWrapper$1.fetchNextOrNull(DefaultTraverser.java:140)
at org.neo4j.helpers.collection.PrefetchingIterator.hasNext(PrefetchingIterator.java:55)
at main.QueryExecutor.main(QueryExecutor.java:173)
答案 0 :(得分:1)
当您使用IteratorUtil.asCollection()执行第二次遍历时,看起来您正急切地消耗整个迭代器。我不确定在这种情况下会产生多少节点,但如果它们很多(即数百万),很可能会导致内存不足问题。
答案 1 :(得分:0)
我通过将cache_type
选项设置为none
来解决我的问题。它不会耗尽内存并在大约一小时内完成。