无法在正在运行的集群的核心节点上安装库

时间:2019-06-06 13:46:21

标签: python python-3.x amazon-web-services jupyter-notebook amazon-emr

我想在已经运行的集群的核心节点上安装某些库。遵循official documentation在正在运行的群集的核心节点上安装库的示例

部分:

我制作了脚本并尝试运行它。它将两个命令都显示为成功,但是当我检查第一个命令是否确实运行了时,是否使用aws cli下载了脚本,但找不到该文件。因此,我认为这些命令根本不会运行。

为了更深入地研究它,我尝试手动运行命令,并且它们起作用了。另外,我尝试检查cluster-id和其他所有内容,以避免“愚蠢的错误”,除了代码之外,其他所有内容都很完美。

python脚本是:

# Install Python libraries on running cluster nodes
from sys import argv

from boto3 import client

try:
    clusterId = argv[1]
    script = argv[2]
except:
    print("Syntax: librariesSsm.py [ClusterId] [S3_Script_Path]")
    import sys
    sys.exit(1)

emrclient = client('emr')

# Get list of core nodes
instances = emrclient.list_instances(
    ClusterId=clusterId, InstanceGroupTypes=['CORE'])['Instances']
instance_list = [x['Ec2InstanceId'] for x in instances]

# Attach tag to core nodes
ec2client = client('ec2')
ec2client.create_tags(Resources=instance_list, Tags=[
                      {"Key": "environment", "Value": "coreNodeLibs"}])

ssmclient = client('ssm')

print("Download shell script from S3")
command = "aws s3 cp " + script + " /home/hadoop"
print("Command is {}".format(command))
try:
    print("Trying to exec first command.")
    first_command = ssmclient.send_command(Targets=[{"Key": "tag:environment", "Values": ["coreNodeLibs"]}],
                                           DocumentName='AWS-RunShellScript',
                                           Parameters={"commands": [command]},
                                           TimeoutSeconds=3600)['Command']['CommandId']
    print("First command is {}".format(first_command))
    # Wait for command to execute
    import time
    time.sleep(15)

    # first_command_status = ssmclient.list_commands(
    #     CommandId="d69ce0bf-a34e-4464-80e3-3a6325b05158",
    #     Filters=[
    #         {
    #             'key': 'Status',
    #             'value': 'SUCCESS'
    #         },
    #     ]
    # )['Commands'][0]['Status']
    first_command_status = ssmclient.list_commands(
        CommandId=first_command
    )['Commands'][0]['Status']
    print(first_command_status)

    second_command = ""
    second_command_status = ""

    # Only execute second command if first command is successful

    if (first_command_status == 'Success'):
        # Run shell script to install libraries

        second_command = ssmclient.send_command(Targets=[{"Key": "tag:environment", "Values": ["coreNodeLibs"]}],
                                                DocumentName='AWS-RunShellScript',
                                                Parameters={"commands": [
                                                    "bash /home/hadoop/install_libraries.sh"]},
                                                TimeoutSeconds=3600)['Command']['CommandId']

        time.sleep(90)
        second_command_status = ssmclient.list_commands(
            CommandId=second_command
        )['Commands'][0]['Status']

        print("First command, " + first_command + ": " + first_command_status)
        print("Second command:" + second_command + ": " + second_command_status)

except Exception as e:
    print(e)

上传到云上的shell脚本是:

sudo docker exec jupyterhub bash -c "python3 -m pip install pandas"

我希望看到一些结果,即 。文件已下载,并且库-> pandas已实际安装。

此外,我仍然不确定这些命令是直接在主节点上运行还是在真正重要的docker容器内运行。

0 个答案:

没有答案