在AWS上启动Spark集群:无法识别AWS_ACCESS_KEY(和其他错误)

时间:2018-07-23 13:55:27

标签: amazon-web-services apache-spark putty

我无法在Amazon Web Service(我在Windows计算机上)上启动Spark集群。我对Spark,Linux或AWS知之甚少,所以我一直遵循https://www.safaribooksonline.com/videos/using-r-for/9781491973035/9781491973035-video283162上的分步指南。

到目前为止,我已经设法使用腻子连接到AWS,并在spark,github等站点下载/解压缩/安装了相关文件。这是我的腻子窗口中的目录:

Using username "ec2-user".
Authenticating with public key "imported-openssh-key"
Last login: Sun Jul 22 14:35:17 2018 from  
99-62-62-54.lightspeed.toldoh.sbcglobal.net

   __|  __|_  )
   _|  (     /   Amazon Linux 2 AMI
  ___|\___|___|

https://aws.amazon.com/amazon-linux-2/
No packages needed for security; 4 packages available
Run "sudo yum update" to apply all updates.
[ec2-user@ip-10-0-0-235 ~]$ ls
spark-2.0.0-bin-hadoop2.7  spark-2.0.0-bin-hadoop2.7.tgz  Spark3.pem     
spark-ec2
[ec2-user@ip-10-0-0-235 ~]$ cd spark-ec2
[ec2-user@ip-10-0-0-235 spark-ec2]$ ls
ami-list             github.hostkey       setup-slave.sh
CONTRIBUTING.md      lib                  spark
copy-dir             LICENSE              spark-ec2
copy-dir.sh          mapreduce            spark_ec2.py
create_image.sh      persistent-hdfs      spark-standalone
create-swap.sh       README.md            ssh-no-keychecking.sh
deploy.generic       resolve-hostname.sh  tachyon
deploy_templates.py  rstudio              templates
ephemeral-hdfs       scala
ganglia              setup.sh
[ec2-user@ip-10-0-0-235 spark-ec2]$

我还运行了“ aws configure”并输入了正确的访问密钥:

[ec2-user@ip-10-0-0-235 spark-ec2]$ aws configure
AWS Access Key ID [****************DL5A]:
AWS Secret Access Key [****************KCQ5]:
Default region name [us-east-2]:
Default output format [None]:
[ec2-user@ip-10-0-0-235 spark-ec2]$

这是哪里出了问题。我被告知应该输入以下命令:

./spark-ec2 -k Spark3 -i /home/ec2-user/Spark3.pem -s 1 -t t2.large -r 
us-east-2 -z us-east-2a launch my_clusters

,这将开始设置过程,该过程需要几分钟。但是我立即收到以下错误消息:

[ec2-user@ip-10-0-0-235 spark-ec2]$ ./spark-ec2 -k Spark3 -i /home/ec2-     
user/Spark3.pem -s 1 -t t2.large -r us-east-2 -z us-east-2a launch    
my_clusters
Setting up security groups...
Traceback (most recent call last):
File "./spark_ec2.py", line 1573, in <module>
main()
File "./spark_ec2.py", line 1565, in main
real_main()
File "./spark_ec2.py", line 1394, in real_main
(master_nodes, slave_nodes) = launch_cluster(conn, opts, cluster_name)
File "./spark_ec2.py", line 523, in launch_cluster
master_group = get_or_make_group(conn, cluster_name + "-master",
opts.vpc_id)
File "./spark_ec2.py", line 371, in get_or_make_group
groups = conn.get_all_security_groups()
AttributeError: 'NoneType' object has no attribute
'get_all_security_groups'
[ec2-user@ip-10-0-0-235 spark-ec2]$

我尝试了几种不同的方法,它们都给出了不同的错误消息。首先,我在它前面加上“ sudo”,然后得到它:

[ec2-user@ip-10-0-0-235 spark-ec2]$ sudo ./spark-ec2 -k Spark3 -i
/home/ec2-user/Spark3.pem -s 1 -t t2.large -r us-east-2 -z us-east-2a   
launch my_clusters
ERROR: The environment variable AWS_ACCESS_KEY_ID must be set
[ec2-user@ip-10-0-0-235 spark-ec2]$

这很奇怪,因为我在运行“ aws configure”时设置了AWS访问密钥。但是我还是尝试手动设置它,我得到了:

[ec2-user@ip-10-0-0-235 spark-ec2]$ set
AWS_ACCESS_KEY_ID=AKIAI624LY4HBRFIDL5A
[ec2-user@ip-10-0-0-235 spark-ec2]$ sudo ./spark-ec2 -k Spark3 -i
/home/ec2-user/Spark3.pem -s 1 -t t2.large -r us-east-2 -z us-east-2a
launch my_clusters
ERROR: The environment variable AWS_ACCESS_KEY_ID must be set

[ec2-user@ip-10-0-0-235 spark-ec2]$ $ export
AWS_ACCESS_KEY_ID=AKIAI624LY4HBRFIDL5A
-bash: $: command not found 
[ec2-user@ip-10-0-0-235 spark-ec2]$ sudo ./spark-ec2 -k Spark3 -i
/home/ec2-user/Spark3.pem -s 1 -t t2.large -r us-east-2 -z us-east-2a  
launch my_clusters
ERROR: The environment variable AWS_ACCESS_KEY_ID must be set
[ec2-user@ip-10-0-0-235 spark-ec2]$

因此,无论我做什么,似乎都无法识别AWS访问密钥。最后,我尝试在命令中保留一些选项,然后我得到了:

[ec2-user@ip-10-0-0-235 spark-ec2]$ ./spark-ec2 -k Spark3 -i /home/ec2-
user/Spark3.pem -s 1 launch my_clusters
Setting up security groups...
Searching for existing cluster my_clusters in region us-east-1...
Spark AMI: ami-5bb18832
Launching instances...
ERROR:boto:400 Bad Request
ERROR:boto:<?xml version="1.0" encoding="UTF-8"?>
<Response><Errors><Error><Code>InvalidKeyPair.NotFound</Code><Message>The
key pair 'Spark3' does not exist</Message></Error></Errors>
<RequestID>4586492c-ca39-449a-9130-182887b2185c</RequestID></Response>
Traceback (most recent call last):
File "./spark_ec2.py", line 1573, in <module>
main()
File "./spark_ec2.py", line 1565, in main
real_main()
File "./spark_ec2.py", line 1394, in real_main
(master_nodes, slave_nodes) = launch_cluster(conn, opts, cluster_name)
File "./spark_ec2.py", line 715, in launch_cluster
instance_profile_name=opts.instance_profile_name)
File "/home/ec2-user/spark-ec2/lib/boto-2.34.0/boto/ec2/image.py", line
329, in run
tenancy=tenancy, dry_run=dry_run)
File "/home/ec2-user/spark-ec2/lib/boto-2.34.0/boto/ec2/connection.py",
line 974, in run_instances
verb='POST')
File "/home/ec2-user/spark-ec2/lib/boto-2.34.0/boto/connection.py", line     
1204, in get_object
raise self.ResponseError(response.status, response.reason, body)
boto.exception.EC2ResponseError: EC2ResponseError: 400 Bad Request
<?xml version="1.0" encoding="UTF-8"?>
<Response><Errors><Error><Code>InvalidKeyPair.NotFound</Code><Message>The
key pair 'Spark3' does not exist</Message></Error></Errors> 
<RequestID>4586492c-ca39-449a-9130-182887b2185c</RequestID></Response>
[ec2-user@ip-10-0-0-235 spark-ec2]$

所以我真的不知道该怎么办。在教学视频中,它运行得非常好,对于这些特定的错误消息,我无法在线找到任何帮助。

任何帮助将不胜感激。

0 个答案:

没有答案