我想知道是否有一种方法可以列出关于emr集群的所有信息。我知道aws cli可以通过aws emr list-steps --cluster-id ID
做这样的事情。
这提供了此群集中所有步骤的所有信息;我想使用python和boto做同样的事情但是想知道在boto emr中是否有一个选项可以列出所有信息(比如aws cli打印)...目前我必须通过特定的调用获取每个信息如:
>>> conn.list_steps('j-2J699C85LW1R6').steps
[<boto.emr.emrobject.StepSummary object at 0x107785ad0>,
<boto.emr.emrobject.StepSummary object at 0x107798b90>,
<boto.emr.emrobject.StepSummary object at 0x107798d90>,
<boto.emr.emrobject.StepSummary object at 0x10778e650>,
<boto.emr.emrobject.StepSummary object at 0x10778ea90>,]
>>> conn.list_steps('j-2J699C85LW1R6').steps[0].id
u's-2LLDFU54O55DJ'
>>> conn.list_steps('j-2J699C85LW1R6').steps[0].status.state
u'COMPLETED'
有很多这样的小论点,例如timeline.enddatetime, config.args,actiononfailure etc etc
,并且想知道是否有一个简单的命令在一次调用中检索所有这些信息以返回json或类似的东西。
答案 0 :(得分:0)
您可以使用describe_step方法获取其他详细信息。
http://boto.cloudhackers.com/en/latest/ref/emr.html
describe_step(cluster_id, step_id)
Describe an Elastic MapReduce step
Parameters:
cluster_id (str) – The cluster id of interest
step_id (str) – The step id of interest
答案 1 :(得分:0)
没有单个调用,但是您可以获取步骤列表,然后遍历它们,在每个步骤上调用describe_step。这是我在GitHub上完整示例中的几个功能。
def list_steps(cluster_id, emr_client):
"""
Gets a list of steps for the specified cluster. In this example, all steps are
returned, including completed and failed steps.
:param cluster_id: The ID of the cluster.
:param emr_client: The Boto3 EMR client object.
:return: The list of steps for the specified cluster.
"""
try:
response = emr_client.list_steps(ClusterId=cluster_id)
steps = response['Steps']
logger.info("Got %s steps for cluster %s.", len(steps), cluster_id)
except ClientError:
logger.exception("Couldn't get steps for cluster %s.", cluster_id)
raise
else:
return steps
def describe_step(cluster_id, step_id, emr_client):
"""
Gets detailed information about the specified step, including the current state of
the step.
:param cluster_id: The ID of the cluster.
:param step_id: The ID of the step.
:param emr_client: The Boto3 EMR client object.
:return: The retrieved information about the specified step.
"""
try:
response = emr_client.describe_step(ClusterId=cluster_id, StepId=step_id)
step = response['Step']
logger.info("Got data for step %s.", step_id)
except ClientError:
logger.exception("Couldn't get data for step %s.", step_id)
raise
else:
return step