Neo4j:工作人员为会议..崩溃。 Java堆空间OutOfMemoryError

时间:2017-12-27 16:21:31

标签: apache-spark neo4j

我正在运行的Spark作业:

这是一个非常简单的程序,已经从java转换为scala并且并行化了#39; (它不打算并行运行,但是是一个实验a)学习spark和neo4j和b)看看我是否可以通过在更多节点执行更多工作的火花簇上运行来获得一些速度提升。原因是大瓶颈是neo4j cypher脚本中的空间调用(内部距离调用)。测试数据集非常小,52,000个节点,数据库大小约为140 MB。

当neo4j启动时,它会给我一个警告

ulimit -Hn

这很奇怪,因为我认为这是打开文件,我要求系统管理员将其设置为更高? (C02RH2U9G8WM:scala-2.11 little.mac$ ulimit -Hn unlimited 似乎证实了这一点?虽然一个ulimit -a显示了4096的开放文件(softlimit),但我认为这就是neo4j所看到的并且发出呜呜声的声明。

此外,当我在我的mac os X上本地运行时。软件将运行并执行大约14个小时左右(可能是9个)然后我会在控制台中看到数据库将停止与火花说话。它没有下降或类似的工作会超时,我仍然可以加密到数据库。但它会以某种方式失去与火花工作的联系,所以他们会尝试,最后火花提交只会放弃和停止。

 while (going through all these lat and lons) 
{
    def DoCalculation()
    {

         val noBbox="call spatial.bbox('geom', {lat:" + minLat +",lon:"+minLon +"}, {lat:"+maxLat+",lon:" + maxLon +"}) yield node return node.altitude as altitude, node.gtype as gtype, node.toDateFormatLong as toDateFormatLong, node.latitude as latitude, node.longitude as longitude, node.fromDateFormatLong as fromDateFormatLong, node.fromDate as fromDate, node.toDate as toDate ORDER BY node.toDateFormatLong DESC";           
         try {
                  //not overly sure what the partitions and batch are really doing for me.
                  val initialDf2 = neo.cypher(noBbox).partitions(5).batch(10000).loadDataFrame

                     val theRow = initialDf2.collect() //was someStr

                     for(i <- 0 until theRow.length){
                            //do more calculations

                        var radius2= 100
                        //this call is where the biggest bottle neck is,t he spatial withinDistance is where i thought
                        //I could put this code ons park and make the calls through data frames and do the same long work
                        //but by batching it out to many nodes would get more speed gains?

                        val pointQuery="call spatial.withinDistance('geom', {lat:" + lat + ",lon:"+ lon +"}, " + radius2 + ") yield node, distance WITH node, distance match (node:POINT) WHERE node.toDateFormatLong < " + toDateFormatLong + " return node.fromDateFormatLong as fromDateFormatLong, node.toDateFormatLong as toDateFormatLong";    
                        try {

                            val pointResults = neo.cypher(pointQuery).loadDataFrame; //did i need to batch here?
                            var prRow = pointResults.collect();     
                            //do stuff with prRow loadDataFrame         
                        } catch {
                             case e: Exception => e.printStackTrace
                        }
                        //do way more stuff with the data just in some scala/java datastructures
                    }
                } catch {
                  case e: Exception => println("EMPTY COLLECTION")
          }
    }
}

(自上次编辑以来,我甚至在neo4j conf中更多地提升了我的限制,现在堆大小最大内存为4gb)

来自作业的一些代码位(使用移植的代码到scala,添加了火花数据帧。我知道它没有正确并行,但希望在向前推进前能够正常工作。)。我正在构建一个混合程序,就像我移植的java中的代码,但使用来自spark的数据帧(连接到neo4j)。

基本上(伪代码):

/var/log/neo4j/neo4j.log

运行使用spark连接器连接到Neo4j的spark-submit作业我在java.lang.OutOfMemoryError: Java heap space 2017-12-27 03:17:13.969+0000 ERROR Worker for session '13662816-0a86-4c95-8b7f-cea9d92440c8' crashed. Java heap space java.lang.OutOfMemoryError: Java heap space at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1855) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2068) at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418) at org.neo4j.bolt.v1.runtime.concurrent.RunnableBoltWorker.run(RunnableBoltWorker.java:88) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) at org.neo4j.helpers.NamedThreadFactory$2.run(NamedThreadFactory.java:109) 2017-12-27 03:17:23.244+0000 ERROR Worker for session '75983e7c-097a-4770-bcab-d63f78300dc5' crashed. Java heap space java.lang.OutOfMemoryError: Java heap space

中收到了这些错误
# Java Heap Size: by default the Java heap size is dynamically
# calculated based on available system resources.
# Uncomment these lines to set specific initial and maximum
# heap size.

我知道neo4j.conf文件我可以更改堆积(当前已注释但设置为512m)我要问的是它在conf文件中的含义:

2017-12-26 16:24:06.768+0000 INFO [o.n.k.i.DiagnosticsManager] NETWORK
2017-12-26 16:24:06.768+0000 INFO [o.n.k.i.DiagnosticsManager] System memory information:
2017-12-26 16:24:06.771+0000 INFO [o.n.k.i.DiagnosticsManager] Total Physical memory: 7.79 GB
2017-12-26 16:24:06.772+0000 INFO [o.n.k.i.DiagnosticsManager] Free Physical memory: 5.49 GB
2017-12-26 16:24:06.772+0000 INFO [o.n.k.i.DiagnosticsManager] Committed virtual memory: 5.62 GB
2017-12-26 16:24:06.773+0000 INFO [o.n.k.i.DiagnosticsManager] Total swap space: 16.50 GB
2017-12-26 16:24:06.773+0000 INFO [o.n.k.i.DiagnosticsManager] Free swap space: 16.49 GB
2017-12-26 16:24:06.773+0000 INFO [o.n.k.i.DiagnosticsManager] JVM memory information:
2017-12-26 16:24:06.773+0000 INFO [o.n.k.i.DiagnosticsManager] Free  memory: 85.66 MB
2017-12-26 16:24:06.773+0000 INFO [o.n.k.i.DiagnosticsManager] Total memory: 126.00 MB
2017-12-26 16:24:06.774+0000 INFO [o.n.k.i.DiagnosticsManager] Max   memory: 1.95 GB
2017-12-26 16:24:06.776+0000 INFO [o.n.k.i.DiagnosticsManager] Garbage Collector: G1 Young Generation: [G1 Eden Space, G1 Survivor Space]
2017-12-26 16:24:06.776+0000 INFO [o.n.k.i.DiagnosticsManager] Garbage Collector: G1 Old Generation: [G1 Eden Space, G1 Survivor Space, G1 Old Gen]
2017-12-26 16:24:06.777+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: Code Cache (Non-heap memory): committed=4.94 MB, used=4.93 MB, max=240.00 MB, threshold=0.00 B
2017-12-26 16:24:06.777+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: Metaspace (Non-heap memory): committed=14.38 MB, used=13.41 MB, max=-1.00 B, threshold=0.00 B
2017-12-26 16:24:06.777+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: Compressed Class Space (Non-heap memory): committed=1.88 MB, used=1.64 MB, max=1.00 GB, threshold=0.00 B
2017-12-26 16:24:06.778+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: G1 Eden Space (Heap memory): committed=39.00 MB, used=35.00 MB, max=-1.00 B, threshold=?
2017-12-26 16:24:06.778+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: G1 Survivor Space (Heap memory): committed=3.00 MB, used=3.00 MB, max=-1.00 B, threshold=?
2017-12-26 16:24:06.778+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: G1 Old Gen (Heap memory): committed=84.00 MB, used=1.34 MB, max=1.95 GB, threshold=0.00 B
2017-12-26 16:24:06.778+0000 INFO [o.n.k.i.DiagnosticsManager] Operating system information:
2017-12-26 16:24:06.779+0000 INFO [o.n.k.i.DiagnosticsManager] Operating System: Linux; version: 3.10.0-693.5.2.el7.x86_64; arch: amd64; cpus: 8
2017-12-26 16:24:06.779+0000 INFO [o.n.k.i.DiagnosticsManager] Max number of file descriptors: 90000
2017-12-26 16:24:06.780+0000 INFO [o.n.k.i.DiagnosticsManager] Number of open file descriptors: 103
2017-12-26 16:24:06.782+0000 INFO [o.n.k.i.DiagnosticsManager] Process id: 26252@hp380-1
2017-12-26 16:24:06.782+0000 INFO [o.n.k.i.DiagnosticsManager] Byte order: LITTLE_ENDIAN
2017-12-26 16:24:06.793+0000 INFO [o.n.k.i.DiagnosticsManager] Local timezone: Etc/GMT
2017-12-26 16:24:06.793+0000 INFO [o.n.k.i.DiagnosticsManager] JVM information:
2017-12-26 16:24:06.794+0000 INFO [o.n.k.i.DiagnosticsManager] VM Name: OpenJDK 64-Bit Server VM
2017-12-26 16:24:06.794+0000 INFO [o.n.k.i.DiagnosticsManager] VM Vendor: Oracle Corporation
2017-12-26 16:24:06.794+0000 INFO [o.n.k.i.DiagnosticsManager] VM Version: 25.151-b12
2017-12-26 16:24:06.794+0000 INFO [o.n.k.i.DiagnosticsManager] JIT compiler: HotSpot 64-Bit Tiered Compilers
2017-12-26 16:24:06.795+0000 INFO [o.n.k.i.DiagnosticsManager] VM Arguments: [-XX:+UseG1GC, -XX:-OmitStackTraceInFastThrow, -XX:+AlwaysPreTouch, -XX:+UnlockExperimentalVMOptions, -XX:+TrustFinalNonStaticFields, -XX:+DisableExplicitGC, -Djdk.tls.ephemeralDHKeySize=2048, -Dunsupported.dbms.udc.source=rpm, -Dfile.encoding=UTF-8]
2017-12-26 16:24:06.795+0000 INFO [o.n.k.i.DiagnosticsManager] Java classpath:

所以这并不意味着我应该单独留下堆中的堆,如果他们的计算肯定比我能设置的更多吗? (这些机器有8个核心和8gb内存)。或者专门设置这些真的有帮助吗?也许到2000年(如果它以兆字节计),得到两个演出?我问,因为我觉得错误日志文件给出了内存不足的错误,但实际上是出于其他原因。

从debug.log编辑我的jvm值

在:

2017-12-27 16:17:30.740+0000 INFO [o.n.k.i.DiagnosticsManager] System memory information:
2017-12-27 16:17:30.749+0000 INFO [o.n.k.i.DiagnosticsManager] Total Physical memory: 7.79 GB
2017-12-27 16:17:30.750+0000 INFO [o.n.k.i.DiagnosticsManager] Free Physical memory: 4.23 GB
2017-12-27 16:17:30.750+0000 INFO [o.n.k.i.DiagnosticsManager] Committed virtual memory: 5.62 GB
2017-12-27 16:17:30.751+0000 INFO [o.n.k.i.DiagnosticsManager] Total swap space: 16.50 GB
2017-12-27 16:17:30.751+0000 INFO [o.n.k.i.DiagnosticsManager] Free swap space: 16.19 GB
2017-12-27 16:17:30.751+0000 INFO [o.n.k.i.DiagnosticsManager] JVM memory information:
2017-12-27 16:17:30.751+0000 INFO [o.n.k.i.DiagnosticsManager] Free  memory: 1.89 GB
2017-12-27 16:17:30.751+0000 INFO [o.n.k.i.DiagnosticsManager] Total memory: 1.95 GB
2017-12-27 16:17:30.752+0000 INFO [o.n.k.i.DiagnosticsManager] Max   memory: 1.95 GB
2017-12-27 16:17:30.777+0000 INFO [o.n.k.i.DiagnosticsManager] Garbage Collector: G1 Young Generation: [G1 Eden Space, G1 Survivor Space]
2017-12-27 16:17:30.777+0000 INFO [o.n.k.i.DiagnosticsManager] Garbage Collector: G1 Old Generation: [G1 Eden Space, G1 Survivor Space, G1 Old Gen]
2017-12-27 16:17:30.778+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: Code Cache (Non-heap memory): committed=4.94 MB, used=4.89 MB, max=240.00 MB, threshold=0.00 B
2017-12-27 16:17:30.778+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: Metaspace (Non-heap memory): committed=14.38 MB, used=13.42 MB, max=-1.00 B, threshold=0.00 B
2017-12-27 16:17:30.778+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: Compressed Class Space (Non-heap memory): committed=1.88 MB, used=1.64 MB, max=1.00 GB, threshold=0.00 B
2017-12-27 16:17:30.779+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: G1 Eden Space (Heap memory): committed=105.00 MB, used=59.00 MB, max=-1.00 B, threshold=?
2017-12-27 16:17:30.779+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: G1 Survivor Space (Heap memory): committed=0.00 B, used=0.00 B, max=-1.00 B, threshold=?
2017-12-27 16:17:30.779+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: G1 Old Gen (Heap memory): committed=1.85 GB, used=0.00 B, max=1.95 GB, threshold=0.00 B
2017-12-27 16:17:30.779+0000 INFO [o.n.k.i.DiagnosticsManager] Operating system information:
2017-12-27 16:17:30.780+0000 INFO [o.n.k.i.DiagnosticsManager] Operating System: Linux; version: 3.10.0-693.5.2.el7.x86_64; arch: amd64; cpus: 8
2017-12-27 16:17:30.780+0000 INFO [o.n.k.i.DiagnosticsManager] Max number of file descriptors: 90000
2017-12-27 16:17:30.781+0000 INFO [o.n.k.i.DiagnosticsManager] Number of open file descriptors: 103
2017-12-27 16:17:30.785+0000 INFO [o.n.k.i.DiagnosticsManager] Process id: 20774@hp380-1
2017-12-27 16:17:30.785+0000 INFO [o.n.k.i.DiagnosticsManager] Byte order: LITTLE_ENDIAN
2017-12-27 16:17:30.814+0000 INFO [o.n.k.i.DiagnosticsManager] Local timezone: Etc/GMT
2017-12-27 16:17:30.815+0000 INFO [o.n.k.i.DiagnosticsManager] JVM information:
2017-12-27 16:17:30.815+0000 INFO [o.n.k.i.DiagnosticsManager] VM Name: OpenJDK 64-Bit Server VM
2017-12-27 16:17:30.815+0000 INFO [o.n.k.i.DiagnosticsManager] VM Vendor: Oracle Corporation
2017-12-27 16:17:30.815+0000 INFO [o.n.k.i.DiagnosticsManager] VM Version: 25.151-b12
2017-12-27 16:17:30.815+0000 INFO [o.n.k.i.DiagnosticsManager] JIT compiler: HotSpot 64-Bit Tiered Compilers
2017-12-27 16:17:30.816+0000 INFO [o.n.k.i.DiagnosticsManager] VM Arguments: [-Xms2000m, -Xmx2000m, -XX:+UseG1GC, -XX:-OmitStackTraceInFastThrow, -XX:+AlwaysPreTouch, -XX:+UnlockExperimentalVMOptions, -XX:+TrustFinalNonStaticFields, -XX:+DisableExplicitGC, -Djdk.tls.ephemeralDHKeySize=2048, -Dunsupported.dbms.udc.source=rpm, -Dfile.encoding=UTF-8]
2017-12-27 16:17:30.816+0000 INFO [o.n.k.i.DiagnosticsManager] Java classpath:

在:

{{1}}

只是一个fyi,我似乎仍然得到java堆错误。这些机器(不仅仅用于生产)只有8GB每个

1 个答案:

答案 0 :(得分:1)

我们通常建议您自己设置。您可以在启动期间检查debug.log文件以查找日志,该文件可以报告它选择用作默认值的值。你正在寻找这样的摘录:

JVM memory information:
Free  memory: 204.79 MB
Total memory: 256.00 MB
Max   memory: 4.00 GB

我相信总内存是初始堆大小,最大内存是最大堆大小。

自己设置时,我们通常会建议将initial和max设置为相同的值。这是关于estimating initial memory configuration的知识库文章,可能会有所帮助。

如果默认值似乎足够,那么最好寻找其他要优化的区域,或者查看问题是否在apache-spark方面已知。