目前,我在具有16GB RAM Ubuntu 16.04.1 x64
的物理机器上使用docker运行具有独立模式的Spark ClusterSpark Cluster容器的RAM配置: 主4g,slave1 2g,slave2 2g,slave3 2g
docker run -itd --net spark -m 4g -p 8080:8080 --name master --hostname master MyAccount/spark &> /dev/null
docker run -itd --net spark -m 2g -p 8080:8080 --name slave1 --hostname slave1 MyAccount/spark &> /dev/null
docker run -itd --net spark -m 2g -p 8080:8080 --name slave2 --hostname slave2 MyAccount/spark &> /dev/null
docker run -itd --net spark -m 2g -p 8080:8080 --name slave3 --hostname slave3 MyAccount/spark &> /dev/null
docker exec -it master sh -c 'service ssh start' > /dev/null
docker exec -it slave1 sh -c 'service ssh start' > /dev/null
docker exec -it slave2 sh -c 'service ssh start' > /dev/null
docker exec -it slave3 sh -c 'service ssh start' > /dev/null
docker exec -it master sh -c '/usr/local/spark/sbin/start-all.sh' > /dev/null
我的MongoDB数据库中有大约170GB的数据。
我使用./mongod
运行MongoDB,而不使用docker在本地主机上进行任何复制和分片。
使用Stratio / Spark-Mongodb连接器
以下命令我在“master”容器上运行:
/usr/local/spark/bin/spark-submit --master spark://master:7077 --executor-memory 2g --executor-cores 1 --packages com.stratio.datasource:spark-mongodb_2.11:0.12.0 code.py
code.py:
from pyspark import SparkContext
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
spark.sql("CREATE TEMPORARY VIEW tmp_tb USING com.stratio.datasource.mongodb OPTIONS (host 'MyPublicIP:27017', database 'firewall', collection 'log_data')")
df = spark.sql("select * from tmp_tb")
df.show()
我修改了/etc/security/limits.conf
和/etc/security/limits.d/20-nproc.conf
* soft nofile unlimited
* hard nofile 131072
* soft nproc unlimited
* hard nproc unlimited
* soft fsize unlimited
* hard fsize unlimited
* soft memlock unlimited
* hard memlock unlimited
* soft cpu unlimited
* hard cpu unlimited
* soft as unlimited
* hard as unlimited
root soft nofile unlimited
root hard nofile 131072
root soft nproc unlimited
root hard nproc unlimited
root soft fsize unlimited
root hard fsize unlimited
root soft memlock unlimited
root hard memlock unlimited
root soft cpu unlimited
root hard cpu unlimited
root soft as unlimited
root hard as unlimited
$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 63682
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 131072
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
另外,添加
kernel.pid_max=200000
vm.max_map_count=600000
/etc/sysctl.conf
中的
然后,重新启动后再次运行spark程序。
我仍然有以下错误说pthread_create failed: Resource temporarily unavailable
和com.mongodb.MongoException$Network: Exception opening the socket
。
错误快照:
物理内存不够吗?或配置的哪一部分我做错了?
感谢。