AWS Data Pipeline无法验证S3 Access [权限警告]

时间:2019-05-29 08:19:31

标签: amazon-web-services amazon-s3 amazon-redshift amazon-iam amazon-data-pipeline

我正在评估AWS数据库服务以选择最有效的服务,目的是每5分钟将来自JSON文件的数据从S3存储桶中加载到Redshift中。

我目前正在尝试使用AWS Data Pipeline进行ETL自动化。我一直在关注这个AWS教程“使用AWS数据管道控制台将数据复制到Amazon Redshift”,所有这些都非常简单明了。

https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-copydata-redshift-create.html

我在Redshift上建立了一个集群,在S3上建立了一个存储桶,创建了具有所有所需权限的所有角色和策略。

现在,在创建管道并按“激活”后出现警告,提示

错误/警告:

Object:Ec2Instance
WARNING: Could not validate S3 Access for role. Please ensure role ('DataPipelineDefaultRole') has s3:Get*, s3:List*, s3:Put* and sts:AssumeRole permissions for DataPipeline.

现在,我确定我的角色和资源角色具有s3:Get*s3:List*s3:Put*sts:AssumeRole

实际上,他们俩都拥有FullAccess来满足我需要的所有服务。

DataPipelineDefaultRole策略:

{
"Version": "2012-10-17",
"Statement": [
    {
        "Sid": "VisualEditor0",
        "Effect": "Allow",
        "Action": "iam:CreateServiceLinkedRole",
        "Resource": "*",
        "Condition": {
            "StringLike": {
                "iam:AWSServiceName": [
                    "elasticmapreduce.amazonaws.com",
                    "spot.amazonaws.com"
                ]
            }
        }
    },
    {
        "Sid": "VisualEditor1",
        "Effect": "Allow",
        "Action": [
            "ec2:AuthorizeSecurityGroupIngress",
            "sdb:Select*",
            "sqs:ReceiveMessage",
            "s3:Get*",
            "sqs:GetQueue*",
            "s3:CreateBucket",
            "sns:Unsubscribe",
            "s3:List*",
            "datapipeline:EvaluateExpression",
            "ec2:StartInstances",
            "dynamodb:DescribeTable",
            "sqs:Delete*",
            "iam:ListAttachedRolePolicies",
            "ec2:RevokeSecurityGroupEgress",
            "dynamodb:GetItem",
            "sns:Subscribe",
            "iam:ListRolePolicies",
            "s3:DeleteObject",
            "sdb:BatchPutAttributes",
            "iam:GetRole",
            "dynamodb:BatchGetItem",
            "redshift:DescribeClusterSecurityGroups",
            "ec2:CreateTags",
            "ec2:DeleteNetworkInterface",
            "ec2:RunInstances",
            "dynamodb:Scan",
            "rds:DescribeDBSecurityGroups",
            "ec2:StopInstances",
            "ec2:CreateNetworkInterface",
            "ec2:CancelSpotInstanceRequests",
            "cloudwatch:*",
            "sqs:PurgeQueue",
            "iam:GetRolePolicy",
            "dynamodb:UpdateTable",
            "ec2:RequestSpotInstances",
            "ec2:DeleteTags",
            "sns:ListTopics",
            "ec2:ModifyImageAttribute",
            "iam:PassRole",
            "sns:Publish",
            "ec2:DescribeNetworkInterfaces",
            "ec2:CreateSecurityGroup",
            "rds:DescribeDBInstances",
            "ec2:ModifyInstanceAttribute",
            "ec2:AuthorizeSecurityGroupEgress",
            "ec2:DetachNetworkInterface",
            "ec2:TerminateInstances",
            "iam:GetInstanceProfile",
            "sns:GetTopicAttributes",
            "datapipeline:DescribeObjects",
            "dynamodb:Query",
            "iam:ListInstanceProfiles",
            "ec2:Describe*",
            "ec2:DeleteSecurityGroup",
            "redshift:DescribeClusters",
            "sqs:CreateQueue",
            "elasticmapreduce:*",
            "s3:Put*"
        ],
        "Resource": "*"
    },
    {
        "Sid": "VisualEditor2",
        "Effect": "Allow",
        "Action": [
            "iam:PassRole",
            "s3:Get*",
            "s3:List*",
            "s3:Put*",
            "sts:AssumeRole"
        ],
        "Resource": [
            "arn:aws:iam::*:role/DataPipelineDefaultResourceRole",
            "arn:aws:iam::*:role/DataPipelineDefaultRole",
            "arn:aws:s3:::*/*"
        ]
    },
    {
        "Sid": "VisualEditor3",
        "Effect": "Allow",
        "Action": [
            "s3:Get*",
            "s3:List*",
            "s3:Put*"
        ],
        "Resource": "arn:aws:s3:::*"
    },
    {
        "Sid": "VisualEditor4",
        "Effect": "Allow",
        "Action": [
            "s3:Get*",
            "s3:List*",
            "s3:Put*"
        ],
        "Resource": "*"
    }
]
}

DataPipelineDefaultResourceRole策略:

{
"Version": "2012-10-17",
"Statement": [
    {
        "Effect": "Allow",
        "Action": [
            "cloudwatch:*",
            "datapipeline:*",
            "dynamodb:*",
            "ec2:Describe*",
            "elasticmapreduce:AddJobFlowSteps",
            "elasticmapreduce:Describe*",
            "elasticmapreduce:ListInstance*",
            "rds:Describe*",
            "redshift:DescribeClusters",
            "redshift:DescribeClusterSecurityGroups",
            "s3:*",
            "sdb:*",
            "sns:*",
            "sqs:*"
        ],
        "Resource": [
            "*"
        ]
    }
]
}

我已经研究问题了一个多星期,尝试了所有现有的解决方案,更新了信任关系,重新创建了角色,保留了默认角色,让数据管道创建了新角色并检查了安全组,同样的问题。

在激活管道并检查日志Uri之后,我确实找到2个文件夹Ec2InstanceRedshiftLoadActivity,在Redshift日志文件中只有2行,另一行有更多[INFO]描述为TaskRunner下载jar和S3文件。

日志中有[INFO]和这些[WARN]

Ec2Instance:

private.com.amazonaws.services.s3.internal.S3V4AuthErrorRetryStrategy: Attempting to re-send the request to mylogbucket.s3.eu-central-1.amazonaws.com with AWS V4 authentication. To avoid this warning in the future, please use region-specific endpoint to access buckets located in regions that require V4 signing.

RedshiftLoadActivity:

private.com.amazonaws.services.s3.internal.S3V4AuthErrorRetryStrategy: Attempting to re-send the request to mylogbucket.s3.eu-central-1.amazonaws.com with AWS V4 authentication. To avoid this warning in the future, please use region-specific endpoint to access buckets located in regions that require V4 signing.

问题应该出在角色和策略上,但是我确保Redshift和S3存储桶不是问题,因为我尝试在查询编辑器上使用COPY命令并按预期加载了数据。

我目前仍然停留在该错误中,希望就如何解决此问题提供一些建议。

  

548 [错误]   (TaskRunnerService资源-df-0539055_ @ Ec2Instance_2019-05-30T13:38:35-0)   amazonaws.datapipeline.database.ConnectionFactory:无法建立   连接到   jdbc:postgresql://redshift-cluster-1.coykb9.eu-central-1.redshift.amazonaws.com:5439 / db   拒绝连接。检查主机名和端口是否正确,以及   邮件管理员正在接受TCP / IP连接

1 个答案:

答案 0 :(得分:0)

数据管道使用EMR生成EC2实例并完成提交的任务。

检查由数据管道生成的EMR的 EC2实例配置文件 EMR角色。 将S3访问策略附加到EMR的EC2实例配置文件角色。

默认情况下, EC2实例配置文件 DataPipelineDefaultResourceRole

用于固定

  

错误无法建立与   jdbc:postgresql://redshift-cluster-1.coykb9.eu-central-1.redshift.amazonaws.com:5439 / db   连接被拒绝

更新redshift安全组入站规则以允许从0.0.0.0/0开始的连接。这意味着任何通过互联网的计算机都可以使用凭据进行连接。