应用错误收集

可以使用AutoScaling和Elastic Mapreduce吗？

时间：2015-04-24 22:02:50

标签： amazon-web-services amazon-ec2

我想知道我是否可以使用AutoScaling根据弹性地图缩减的cpu利用率自动放大或缩小Amazon Ec2容量。

例如，我只使用1个实例启动mapreduce作业，但是如果此实例达到50％利用率，例如我想使用创建的AutoScaling组来启动新实例。这可能吗？

你知道是否有可能吗？或弹性mapreduce因为＆＃34;弹性＆＃34;，如果它需要自动启动更多实例而没有任何配置？

2 个答案:

答案 0 :(得分：2)

You need Qubole: http://www.qubole.com/blog/product/industrys-first-auto-scaling-hadoop-clusters/

We have never seen any of our users/customers use vanilla auto-scaling successfully with Hadoop. Hadoop is stateful. Nodes hold HDFS data and intermediate outputs. Deleting nodes based on cpu/memory just doesn't work. Adding nodes needs sophistication - this isn't a web site. One needs to look at the sizes of jobs submitted and the speed at which they are completing.

We run the largest Hadoop clusters, easily, on AWS (for our customers). And they auto-scale all the time. And they use spot instances. And it costs the same as EMR.

答案 1 :(得分：1)

不，Auto Scaling不能与Amazon Elastic MapReduce（EMR）一起使用。

可以通过API或命令行调用来扩展EMR，添加和删除任务节点（不承载HDFS存储）。请注意，无法删除核心节点（因为它们托管HDFS存储，删除节点可能会导致数据丢失）。实际上，这是Core和Task节点之间的唯一区别。

也可以从EMR“步骤”中更改节点数。步骤是按顺序执行的，因此可以在需要大量处理的步骤之前使群集更大，并且可以在后续步骤中缩小群集。

来自EMR Developer Guide：

每个群集步骤可以有不同数量的从属节点。您还可以向正在运行的集群添加步骤以修改从属节点的数量。由于默认情况下所有步骤都保证按顺序运行，因此您可以为任何步骤指定正在运行的从属节点数。

CPU不会成为基于EMR集群扩展的良好指标，因为Hadoop会在作业运行时尽可能保持所有节点的忙碌。一个更好的指标是等待的工作数量，这样他们就可以更快地完成工作。

另见：

Stackoverflow：Can we add more Amazon Elastic Mapreduce instances into an existing Amazon Elastic Mapreduce instances?
Stackoverflow：Can Amazon Auto Scaling Service work with Elastic Map Reduce Service?