我试图在Docker中运行Spark实例并经常抛出此异常:
16/10/30 23:20:26 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-1,5,main]
java.lang.OutOfMemoryError: unable to create new native thread
我正在使用此Docker镜像https://github.com/sequenceiq/docker-spark。
我的ulimits在容器内似乎没问题:
bash-4.1# ulimit -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 29747
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1048576
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 1048576
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
它们在容器外面看起来也很好,在主机上:
kane@thinkpad ~> ulimit -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 29747
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 29747
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
我的谷歌搜索告诉我,systemd可以限制任务并导致此问题,但我已将我的任务限制设置为无限:
kane@thinkpad ~> grep TasksMax /usr/lib/systemd/system/docker.service
20:TasksMax=infinity
kane@thinkpad ~> systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2016-10-31 08:22:39 AWST; 3h 14min ago
Docs: http://docs.docker.com
Main PID: 1107 (docker-current)
Tasks: 56
Memory: 34.9M
CPU: 30.292s
有什么想法吗?我的Spark代码只是从Kafka实例读取(在单独的Docker容器中运行)并执行基本map / reduce。没什么好看的。
答案 0 :(得分:0)
错误表明您无法创建更多原生线程,因为您没有足够的内存。它并不一定意味着你达到了ulimits,但你没有足够的内存来创建更多的线程。
在JVM中创建线程的内存大小由-XSS Flag控制,如果我没记错的话,默认为1024k。如果您没有进行大量递归调用,则可以尝试减少XSS标志,并能够创建具有相同可用内存量的更多线程。如果XSS太小,您将遇到StackOverFlowError
docker-spark图像使用包含HDFS和Yarn服务的hadoop-docker图像 您可以从容器中为JVM堆大小(hdfs,yarns)分配太多内存,因此没有足够的内存来分配新线程。
希望它能帮助