当我仅从bash脚本运行应用程序时,便已创建了该应用程序的docker映像,它可以正常工作。但是,当我将其作为docker-compose文件的一部分运行时,该应用程序会挂在消息上:
18/06/27 13:17:18 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
即使我等了一会儿,心跳流还是超时了。为何使用Docker这样的Spark Streaming + Neo4j应用程序性能可能是什么原因,以及如何对其进行改进?
我的应用程序的docker-compose文件:
version: '3.3'
services:
consumer-demo:
build:
context: .
dockerfile: Dockerfile
args:
- ARG_CLASS=consumer
- HOST=neo4jdb
volumes:
- ./:/workdir
working_dir: /workdir
restart: always
所有应用程序的整体docker-compose文件:
version: '3.3'
services:
kafka:
image: spotify/kafka
ports:
- "9092:9092"
networks:
- docker_elk
environment:
- ADVERTISED_HOST=localhost
neo4jdb:
image: neo4j:latest
container_name: neo4jdb
ports:
- "7474:7474"
- "7473:7473"
- "7687:7687"
networks:
- docker_elk
volumes:
- /var/lib/neo4j/import:/var/lib/neo4j/import
- /var/lib/neo4j/data:/data
- /var/lib/neo4j/conf:/conf
environment:
- NEO4J_dbms_active__database=graphImport.db
elasticsearch:
image: elasticsearch:latest
ports:
- "9200:9200"
- "9300:9300"
networks:
- docker_elk
volumes:
- esdata1:/usr/share/elasticsearch/data
kibana:
image: kibana:latest
ports:
- "5601:5601"
networks:
- docker_elk
volumes:
esdata1:
driver: local
networks:
docker_elk:
driver: bridge
应用程序可以正常使用的bash脚本:
#!/usr/bin/env bash
if [ "$1" = "consumer" ]
then
java -cp "jars/spark_consumer.jar" consumer.SparkConsumer
else
echo "Wrong parameter. It should be consumer or producer, but it is $1"
fi
应用程序Dockerfile可能是应用程序执行速度降低的原因:
FROM java:8
ARG ARG_CLASS
ARG HOST
ENV MAIN_CLASS $ARG_CLASS
ENV SCALA_VERSION 2.11.8
ENV SBT_VERSION 1.1.1
ENV SPARK_VERSION 2.2.0
ENV SPARK_DIST spark-$SPARK_VERSION-bin-hadoop2.6
ENV SPARK_ARCH $SPARK_DIST.tgz
ENV HOSTNAME bolt://$HOST:7687
VOLUME /workdir
WORKDIR /opt
# Install Scala
RUN \
cd /root && \
curl -o scala-$SCALA_VERSION.tgz http://downloads.typesafe.com/scala/$SCALA_VERSION/scala-$SCALA_VERSION.tgz && \
tar -xf scala-$SCALA_VERSION.tgz && \
rm scala-$SCALA_VERSION.tgz && \
echo >> /root/.bashrc && \
echo 'export PATH=~/scala-$SCALA_VERSION/bin:$PATH' >> /root/.bashrc
# Install SBT
RUN \
curl -L -o sbt-$SBT_VERSION.deb https://dl.bintray.com/sbt/debian/sbt-$SBT_VERSION.deb && \
dpkg -i sbt-$SBT_VERSION.deb && \
rm sbt-$SBT_VERSION.deb
# Install Spark
RUN \
cd /opt && \
curl -o $SPARK_ARCH http://d3kbcqa49mib13.cloudfront.net/$SPARK_ARCH && \
tar xvfz $SPARK_ARCH && \
rm $SPARK_ARCH && \
echo 'export PATH=$SPARK_DIST/bin:$PATH' >> /root/.bashrc
EXPOSE 9851 9852 4040 9092 9200 9300 5601 7474 7687 7473
CMD /workdir/runDemo.sh "$MAIN_CLASS"
答案 0 :(得分:0)
问题在于,另一个Spark进程正在计算机上运行,阻止了Spark数据流。我用ps aux | grep spark
检查了所有进程,发现另一个正在运行的进程。只需杀死该进程并重新启动Spark Streaming应用程序即可解决问题。