Google云:使用gsutil将数据从AWS S3下载到GCS

时间:2017-10-30 22:44:03

标签: google-cloud-storage gsutil

我们的一位合作者在AWS上提供了一些数据,我试图使用gsutil将其放入我们的Google云端桶中(只有部分文件对我们有用,所以我不想使用GCS上提供的GUI)。协作者向我们提供了AWS桶ID,aws访问密钥ID和aws秘密访问密钥ID。

我查看了GCE上的文档并编辑了〜/ .botu文件,以便合并访问密钥。我重新启动了我的终端,试图做一个' ls'但得到了以下错误:

gsutil ls s3://cccc-ffff-03210/
AccessDeniedException: 403 AccessDenied
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied

我是否还需要配置/运行其他内容?

谢谢!

编辑:

感谢您的回复!

我安装了Cloud SDK,我可以访问并运行google云存储项目中的所有gsutil命令。我的问题是试图访问(例如&#39; ls&#39;命令)与我共享的亚马逊S3。

  1. 我在〜/ .boto文件中取消注释了两行并放入了访问键:

    # To add HMAC aws credentials for "s3://" URIs, edit and uncomment the
    # following two lines:
    aws_access_key_id = my_access_key
    aws_secret_access_key = my_secret_access_key
    
    1. 输出&#39; gsutil版本-l&#39;:

      | => gsutil version -l
      
      my_gc_id
      gsutil version: 4.27
      checksum: 5224e55e2df3a2d37eefde57 (OK)
      boto version: 2.47.0
      python version: 2.7.10 (default, Oct 23 2015, 19:19:21) [GCC 4.2.1                                                 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)]
      OS: Darwin 15.4.0
      multiprocessing available: True
      using cloud sdk: True
      pass cloud sdk credentials to gsutil: True
      config path(s): /Users/pc/.boto, /Users/pc/.config/gcloud/legacy_credentials/pc@gmail.com/.boto
      gsutil path: /Users/pc/Documents/programs/google-cloud-        sdk/platform/gsutil/gsutil
      compiled crcmod: True
      installed via package manager: False
      editable install: False
      
      1. 带-DD选项的输出为:

        => gsutil -DD ls s3://my_amazon_bucket_id
        
        multiprocessing available: True
        using cloud sdk: True
        pass cloud sdk credentials to gsutil: True
        config path(s): /Users/pc/.boto, /Users/pc/.config/gcloud/legacy_credentials/pc@gmail.com/.boto
        gsutil path: /Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gsutil
        compiled crcmod: True
        installed via package manager: False
        editable install: False
        Command being run: /Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=my_gc_id -DD ls s3://my_amazon_bucket_id
        config_file_list: ['/Users/pc/.boto', '/Users/pc/.config/gcloud/legacy_credentials/pc@gmail.com/.boto']
        config: [('debug', '0'), ('working_dir', '/mnt/pyami'), ('https_validate_certificates', 'True'), ('debug', '0'), ('working_dir', '/mnt/pyami'), ('content_language', 'en'), ('default_api_version', '2'), ('default_project_id', 'my_gc_id')]
        DEBUG 1103 08:42:34.664643 provider.py] Using access key found in shared credential file.
        DEBUG 1103 08:42:34.664919 provider.py] Using secret key found in shared credential file.
        DEBUG 1103 08:42:34.665841 connection.py] path=/
        DEBUG 1103 08:42:34.665967 connection.py] auth_path=/my_amazon_bucket_id/
        DEBUG 1103 08:42:34.666115 connection.py] path=/?delimiter=/
        DEBUG 1103 08:42:34.666200 connection.py] auth_path=/my_amazon_bucket_id/?delimiter=/
        DEBUG 1103 08:42:34.666504 connection.py] Method: GET
        DEBUG 1103 08:42:34.666589 connection.py] Path: /?delimiter=/
        DEBUG 1103 08:42:34.666668 connection.py] Data: 
        DEBUG 1103 08:42:34.666724 connection.py] Headers: {}
        DEBUG 1103 08:42:34.666776 connection.py] Host: my_amazon_bucket_id.s3.amazonaws.com
        DEBUG 1103 08:42:34.666831 connection.py] Port: 443
        DEBUG 1103 08:42:34.666882 connection.py] Params: {}
        DEBUG 1103 08:42:34.666975 connection.py] establishing HTTPS connection: host=my_amazon_bucket_id.s3.amazonaws.com, kwargs={'port': 443, 'timeout': 70}
        DEBUG 1103 08:42:34.667128 connection.py] Token: None
        DEBUG 1103 08:42:34.667476 auth.py] StringToSign:
        GET
        
        
        Fri, 03 Nov 2017 12:42:34 GMT
        /my_amazon_bucket_id/
        DEBUG 1103 08:42:34.667600 auth.py] Signature:
        AWS RN8=
        DEBUG 1103 08:42:34.667705 connection.py] Final headers: {'Date': 'Fri, 03 Nov 2017 12:42:34 GMT', 'Content-Length': '0', 'Authorization': u'AWS AK6GJQ:EFVB8F7rtGN8=', 'User-Agent': 'Boto/2.47.0 Python/2.7.10 Darwin/15.4.0 gsutil/4.27 (darwin) google-cloud-sdk/164.0.0'}
        DEBUG 1103 08:42:35.179369 https_connection.py] wrapping ssl socket; CA certificate file=/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/third_party/boto/boto/cacerts/cacerts.txt
        DEBUG 1103 08:42:35.247599 https_connection.py] validating server certificate: hostname=my_amazon_bucket_id.s3.amazonaws.com, certificate hosts=['*.s3.amazonaws.com', 's3.amazonaws.com']
        send: u'GET /?delimiter=/ HTTP/1.1\r\nHost: my_amazon_bucket_id.s3.amazonaws.com\r\nAccept-Encoding: identity\r\nDate: Fri, 03 Nov 2017 12:42:34 GMT\r\nContent-Length: 0\r\nAuthorization: AWS AN8=\r\nUser-Agent: Boto/2.47.0 Python/2.7.10 Darwin/15.4.0 gsutil/4.27 (darwin) google-cloud-sdk/164.0.0\r\n\r\n'
        reply: 'HTTP/1.1 403 Forbidden\r\n'
        header: x-amz-bucket-region: us-east-1
        header: x-amz-request-id: 60A164AAB3971508
        header: x-amz-id-2: +iPxKzrW8MiqDkWZ0E=
        header: Content-Type: application/xml
        header: Transfer-Encoding: chunked
        header: Date: Fri, 03 Nov 2017 12:42:34 GMT
        header: Server: AmazonS3
        DEBUG 1103 08:42:35.326652 connection.py] Response headers: [('date', 'Fri, 03 Nov 2017 12:42:34 GMT'), ('x-amz-id-2', '+iPxKz1dPdgDxpnWZ0E='), ('server', 'AmazonS3'), ('transfer-encoding', 'chunked'), ('x-amz-request-id', '60A164AAB3971508'), ('x-amz-bucket-region', 'us-east-1'), ('content-type', 'application/xml')]
        DEBUG 1103 08:42:35.327029 bucket.py] <?xml version="1.0" encoding="UTF-8"?>
        <Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>6097164508</RequestId><HostId>+iPxKzrWWZ0E=</HostId></Error>
        DEBUG: Exception stack trace:
        Traceback (most recent call last):
          File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 577, in _RunNamedCommandAndHandleExceptions
            collect_analytics=True)
          File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/command_runner.py", line 317, in RunNamedCommand
            return_code = command_inst.RunCommand()
          File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/commands/ls.py", line 548, in RunCommand
            exp_dirs, exp_objs, exp_bytes = ls_helper.ExpandUrlAndPrint(storage_url)
          File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/ls_helper.py", line 180, in ExpandUrlAndPrint
            print_initial_newline=False)
          File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/ls_helper.py", line 252, in _RecurseExpandUrlAndPrint
            bucket_listing_fields=self.bucket_listing_fields):
          File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/wildcard_iterator.py", line 476, in IterAll
            expand_top_level_buckets=expand_top_level_buckets):
          File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/wildcard_iterator.py", line 157, in __iter__
            fields=bucket_listing_fields):
          File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/boto_translation.py", line 413, in ListObjects
            self._TranslateExceptionAndRaise(e, bucket_name=bucket_name)
          File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/boto_translation.py", line 1471, in _TranslateExceptionAndRaise
            raise translated_exception
        AccessDeniedException: AccessDeniedException: 403 AccessDenied
        
        
        AccessDeniedException: 403 AccessDenied
        

2 个答案:

答案 0 :(得分:6)

我假设您能够使用gcloud initgcloud auth logingcloud auth activate-service-account设置gcloud凭据,并且可以成功将对象列出/写入GCS。

从那里,您需要两件事。正确配置的AWS IAM角色已应用于您正在使用的AWS用户,以及正确配置的~/.boto文件。

用于存储桶访问的AWS S3 IAM策略

必须通过授予用户角色或附加给用户的内联策略来应用这样的策略。

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::some-s3-bucket/*",
                "arn:aws:s3:::some-s3-bucket"
            ]
        }
    ]
}

重要的部分是您具有ListBucketGetObject动作,这些动作的资源范围至少包括您希望读取的存储桶(或其前缀)。

.boto文件配置

服务提供商之间的互操作总是有些棘手。在撰写本文时,为了支持AWS Signature V4(所有AWS区域普遍支持的唯一属性),您必须在凭证中添加除凭证之外的几个额外属性: ~/.boto组。

[s3]

[Credentials] aws_access_key_id = [YOUR AKID] aws_secret_access_key = [YOUR SECRET AK] [s3] use-sigv4=True host=s3.us-east-2.amazonaws.com property通过gsutil提示Boto使用AWS Signature V4进行请求。当前,不幸的是,这需要在配置中指定主机。找出主机名非常容易,因为它遵循use-sigv4的模式。

如果您有来自多个S3区域的rsync / cp工作,则可以通过几种方法进行处理。您可以在运行命令之前在多个文件之间进行切换,然后设置s3.[BUCKET REGION].amazonaws.com之类的环境变量。或者,您可以使用顶级参数来覆盖每次运行的设置,例如:

BOTO_CONFIG

答案 1 :(得分:0)

<强> 1。生成您的GCS凭据

如果您下载the Cloud SDK,然后运行gcloud initgcloud auth login,gcloud应为您登录的帐户配置OAuth2凭据,以便您访问GCS存储桶(它可以这可以通过创建一个除了~/.boto文件之外还加载的boto文件(如果存在)。

如果您使用的是独立的gsutil,请运行gsutil config以生成~/.boto的配置文件。

<强> 2。将您的AWS凭据添加到文件~/.boto

~/.boto文件的[凭据]部分应填充并取消注释这两行:

aws_access_key_id = IDHERE
aws_secret_access_key = KEYHERE

如果你这样做了:

  • 确保您没有意外地交换密钥和ID的值。
  • 确认您正在加载正确的boto文件 - 您可以执行此操作 运行gsutil version -l并查找“config path(s):”行。
  • 如果你仍然收到403,他们可能也会给你 错误的存储桶名称,或与帐户对应的密钥和ID 没有列出该存储桶内容的权限。