使用bdutil在现有GCE hadoop / spark集群中添加或删除节点

时间:2015-02-11 12:37:56

标签: google-cloud-platform google-hadoop

我开始在使用bdutil(在GoogleCloudPlatform github上)部署的谷歌云存储支持的谷歌计算引擎上运行一个火花群集,我这样做如下:

./bdutil -e bigquery_env.sh,datastore_env.sh,extensions/spark/spark_env.sh -b myhdfsbucket deploy

我希望我可能想要从一个包含2个节点的集群开始(作为默认设置),然后又想添加另一个工作节点来应对需要运行的大型工作。如果可能的话,我想在不完全破坏和重新部署集群的情况下这样做。

我尝试使用相同的命令重新部署不同数量的节点,或者运行“create”和“run_command_group install_connectors”,如下所示,但是对于其中的每一个,我都会收到有关已存在节点的错误,例如

./bdutil -n 3 -e bigquery_env.sh,datastore_env.sh,extensions/spark/spark_env.sh -b myhdfsbucket deploy

./bdutil -n 3 -b myhdfsbucket create
./bdutil -n 3 -t workers -b myhdfsbucket run_command_group install_connectors

我也尝试过快照和克隆已经运行的一个工作人员,但并非所有的服务似乎都没有正确启动,而且我的工作有点偏离。

关于我如何/应该从现有群集中添加和/或删除节点的任何指导?

1 个答案:

答案 0 :(得分:3)

<强>更新: 我们将resize_env.sh添加到基础bdutil repo,因此您无需再转到我的分支

原始回答:

目前还没有官方支持调整bdutil部署集群的大小,但这肯定是我们之前讨论过的事情,事实上,将调整大小的基本支持放在一起是相当可行的。一旦合并到主分支中,这可能采取不同的形式,但我已将调整大小支持的初稿推送到my fork of bdutil。这是在两次提交中实现的;一个允许skipping all "master" operations(包括create,run_command,delete等),另一个允许add the resize_env.sh file

我没有针对其他bdutil扩展程序的所有组合对其进行测试,但我至少成功地使用基础bdutil_env.shextensions/spark/spark_env.sh运行它。从理论上讲,它也适用于您的bigquery和数据存储扩展。在你的情况下使用它:

# Assuming you initially deployed with this command (default n == 2)
./bdutil -e bigquery_env.sh,datastore_env.sh,extensions/spark/spark_env.sh -b myhdfsbucket -n 2 deploy

# Before this step, edit resize_env.sh and set NEW_NUM_WORKERS to what you want.
# Currently it defaults to 5.
# Deploy only the new workers, e.g. {hadoop-w-2, hadoop-w-3, hadoop-w-4}:
./bdutil -e bigquery_env.sh,datastore_env.sh,extensions/spark/spark_env.sh -b myhdfsbucket -n 2 -e resize_env.sh deploy

# Explicitly start the Hadoop daemons on just the new workers:
./bdutil -e bigquery_env.sh,datastore_env.sh,extensions/spark/spark_env.sh -b myhdfsbucket -n 2 -e resize_env.sh run_command -t workers -- "service hadoop-hdfs-datanode start && service hadoop-mapreduce-tasktracker start"

# If using Spark as well, explicitly start the Spark daemons on the new workers:
./bdutil -e bigquery_env.sh,datastore_env.sh,extensions/spark/spark_env.sh -b myhdfsbucket -n 2 -e resize_env.sh run_command -t workers -u extensions/spark/start_single_spark_worker.sh -- "./start_single_spark_worker.sh"

# From now on, it's as if you originally turned up your cluster with "-n 5".
# When deleting, remember to include those extra workers:
./bdutil -b myhdfsbucket -n 5 delete

通常,最佳做法建议是将配置压缩到文件中,而不是始终传递标记。例如,在您的情况下,您可能需要一个名为my_base_env.sh的文件:

import_env bigquery_env.sh
import_env datastore_env.sh
import_env extensions/spark/spark_env.sh

NUM_WORKERS=2
CONFIGBUCKET=myhdfsbucket

然后调整大小命令要短得多:

# Assuming you initially deployed with this command (default n == 2)
./bdutil -e my_base_env.sh deploy

# Before this step, edit resize_env.sh and set NEW_NUM_WORKERS to what you want.
# Currently it defaults to 5.
# Deploy only the new workers, e.g. {hadoop-w-2, hadoop-w-3, hadoop-w-4}:
./bdutil -e my_base_env.sh -e resize_env.sh deploy

# Explicitly start the Hadoop daemons on just the new workers:
./bdutil -e my_base_env.sh -e resize_env.sh run_command -t workers -- "service hadoop-hdfs-datanode start && service hadoop-mapreduce-tasktracker start"

# If using Spark as well, explicitly start the Spark daemons on the new workers:
./bdutil -e my_base_env.sh -e resize_env.sh run_command -t workers -u extensions/spark/start_single_spark_worker.sh -- "./start_single_spark_worker.sh"

# From now on, it's as if you originally turned up your cluster with "-n 5".
# When deleting, remember to include those extra workers:
./bdutil -b myhdfsbucket -n 5 delete

最后,这与最初使用-n 5部署群集的情况不完全相同;在这种情况下,主节点/home/hadoop/hadoop-install/conf/slaves/home/hadoop/spark-install/conf/slaves上的文件将缺少新节点。如果您计划使用/home/hadoop/hadoop-install/bin/[stop|start]-all.sh/home/hadoop/spark-install/sbin/[stop|start]-all.sh,则可以手动SSH到主节点并编辑这些文件,以将新节点添加到列表中;如果没有,则无需更改这些从属文件。