What does one enter on the command line to run spark in a bokeh serve app? Do I simply separate the two command line entries by &&?

时间:2018-11-27 00:53:09

标签: pyspark bokeh dask dask-distributed

My effort does not work: /usr/local/spark/spark-2.3.2-bin-hadoop2.7/bin/spark-submit --driver-memory 6g --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.3.2 runspark.py && bokeh serve --show bokeh_app

runspark.py contains the instantiation of spark, and bokeh_app is the folder of the bokeh server app. spark is being used to update a streaming dask dataframe.

WHAT HAPPENS: The spark instance starts running, loads as it normally would without the bokeh server. However as soon as the bokeh server app kicks in (i.e.) the web page opens, the spark instance shuts down. It doesn't send back any errors in the console output. OUTPUT BELOW:

2018-11-26 21:04:05 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4f0492c9{/static/sql,null,AVAILABLE,@Spark} 2018-11-26 21:04:06 INFO StateStoreCoordinatorRef:54 - Registered StateStoreCoordinator endpoint 2018-11-26 21:04:06 INFO SparkContext:54 - Invoking stop() from shutdown hook 2018-11-26 21:04:06 INFO AbstractConnector:318 - Stopped Spark@4f3c4272{HTTP/1.1,[http/1.1]}{0.0.0.0:4041} 2018-11-26 21:04:06 INFO SparkUI:54 - Stopped Spark web UI at http://192.168.1.25:4041 2018-11-26 21:04:06 INFO MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped! 2018-11-26 21:04:06 INFO MemoryStore:54 - MemoryStore cleared 2018-11-26 21:04:06 INFO BlockManager:54 - BlockManager stopped 2018-11-26 21:04:06 INFO BlockManagerMaster:54 - BlockManagerMaster stopped 2018-11-26 21:04:07 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped! 2018-11-26 21:04:07 INFO SparkContext:54 - Successfully stopped SparkContext 2018-11-26 21:04:07 INFO ShutdownHookManager:54 - Shutdown hook called 2018-11-26 21:04:07 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-c42ce0b3-d49e-48ce-962c-277b42166267 2018-11-26 21:04:07 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-bd448b2e-6b0f-467a-9e43-689542c42a6f 2018-11-26 21:04:07 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-bd448b2e-6b0f-467a-9e43-689542c42a6f/pyspark-117d2a10-7cb9-4eb3-b4d0-f92f9046522c 2018-11-26 21:04:08,542 Starting Bokeh server version 0.13.0 (running on Tornado 5.1.1) 2018-11-26 21:04:08,547 Bokeh app running at: http://localhost:5006/aion_analytics 2018-11-26 21:04:08,547 Starting Bokeh server with process id: 10769

1 个答案:

答案 0 :(得分:0)

好的,我找到了答案。这个想法只是将bokeh服务器嵌入pyspark代码中,而不是从命令行运行bokeh服务器。照常使用pyspark Submit命令。

https://github.com/bokeh/bokeh/blob/1.0.1/examples/howto/server_embed/standalone_embed.py

我完全按照上面的链接显示。