我想在已经运行的集群的核心节点上安装某些库。遵循official documentation和在正在运行的群集的核心节点上安装库的示例
部分:我制作了脚本并尝试运行它。它将两个命令都显示为成功,但是当我检查第一个命令是否确实运行了即时,是否使用aws cli
下载了脚本,但找不到该文件。因此,我认为这些命令根本不会运行。
为了更深入地研究它,我尝试手动运行命令,并且它们起作用了。另外,我尝试检查cluster-id和其他所有内容,以避免“愚蠢的错误”,除了代码之外,其他所有内容都很完美。
python脚本是:
# Install Python libraries on running cluster nodes
from sys import argv
from boto3 import client
try:
clusterId = argv[1]
script = argv[2]
except:
print("Syntax: librariesSsm.py [ClusterId] [S3_Script_Path]")
import sys
sys.exit(1)
emrclient = client('emr')
# Get list of core nodes
instances = emrclient.list_instances(
ClusterId=clusterId, InstanceGroupTypes=['CORE'])['Instances']
instance_list = [x['Ec2InstanceId'] for x in instances]
# Attach tag to core nodes
ec2client = client('ec2')
ec2client.create_tags(Resources=instance_list, Tags=[
{"Key": "environment", "Value": "coreNodeLibs"}])
ssmclient = client('ssm')
print("Download shell script from S3")
command = "aws s3 cp " + script + " /home/hadoop"
print("Command is {}".format(command))
try:
print("Trying to exec first command.")
first_command = ssmclient.send_command(Targets=[{"Key": "tag:environment", "Values": ["coreNodeLibs"]}],
DocumentName='AWS-RunShellScript',
Parameters={"commands": [command]},
TimeoutSeconds=3600)['Command']['CommandId']
print("First command is {}".format(first_command))
# Wait for command to execute
import time
time.sleep(15)
# first_command_status = ssmclient.list_commands(
# CommandId="d69ce0bf-a34e-4464-80e3-3a6325b05158",
# Filters=[
# {
# 'key': 'Status',
# 'value': 'SUCCESS'
# },
# ]
# )['Commands'][0]['Status']
first_command_status = ssmclient.list_commands(
CommandId=first_command
)['Commands'][0]['Status']
print(first_command_status)
second_command = ""
second_command_status = ""
# Only execute second command if first command is successful
if (first_command_status == 'Success'):
# Run shell script to install libraries
second_command = ssmclient.send_command(Targets=[{"Key": "tag:environment", "Values": ["coreNodeLibs"]}],
DocumentName='AWS-RunShellScript',
Parameters={"commands": [
"bash /home/hadoop/install_libraries.sh"]},
TimeoutSeconds=3600)['Command']['CommandId']
time.sleep(90)
second_command_status = ssmclient.list_commands(
CommandId=second_command
)['Commands'][0]['Status']
print("First command, " + first_command + ": " + first_command_status)
print("Second command:" + second_command + ": " + second_command_status)
except Exception as e:
print(e)
上传到云上的shell脚本是:
sudo docker exec jupyterhub bash -c "python3 -m pip install pandas"
我希望看到一些结果,即 。文件已下载,并且库-> pandas
已实际安装。
此外,我仍然不确定这些命令是直接在主节点上运行还是在真正重要的docker容器内运行。