Question

我想使用AWS数据管道服务将数据从Oracle RDS数据库传输到s3，然后传输到Glacier。有人可以告诉我如何实现这一目标。

Answer 1

您可以设置要复制的AWS DataPipeline，以将RDS Oracle表的每日增量副本执行到S3。将其移动到S3存储桶后，您可以将其存档到Glacier。 https://aws.amazon.com/blogs/aws/archive-s3-to-glacier/

您需要将从http://www.oracle.com/technetwork/database/features/jdbc/jdbc-drivers-12c-download-1958347.html下载的Oracle jdbc驱动程序上传到S3存储桶位置，并使用jdbcDriverJarUri字段指定s3路径。

请参阅此http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-rdsdatabase.html。

这是一个示例DataPipeline模板，用于调度Amazon EC2实例以执行从Amazon RDS Oracle表到Amazon S3的增量数据复制。 RDS Oracle表必须具有存储上次修改时间值的列。此模板将复制从计划开始时间开始的计划间隔之间对表进行的更改。不会复制对表的物理删除。输出将作为CSV文件写入输出S3文件夹下的带时间戳的子文件夹中。

{
    "metadata": {
        "templateName": "Incremental copy of RDS Oracle table to S3",
        "templateDescription": "Incremental copy of RDS Oracle table to S3"
    },
        "objects":[
        {
            "name": "DailySchedule",
            "id": "DailySchedule",
            "startAt" : "FIRST_ACTIVATION_DATE_TIME",
            "period": "1 hour",
            "type": "Schedule"
        },
        {
            "id": "Default",
            "name": "Default",
            "schedule": {
                "ref": "DailySchedule"
            },
            "failureAndRerunMode": "CASCADE",
            "role": "DataPipelineDefaultRole",
            "resourceRole": "DataPipelineDefaultResourceRole"
        },
        {
            "name":"SourceRDSTable",
            "id":"SourceRDSTable",
            "type":"RdsDatabase",
            "table":"#{myRDSTableName}",
            "username":"#{myRDSUsername}",
            "*password":"#{*myRDSPassword}",
            "jdbcDriverJarUri" : "#{myOracleJdbcDriverUri}",
            "rdsInstanceId":"#{myRDSInstanceId}",
            "scheduleType": "TIMESERIES",
            "selectQuery":"select * from #{table} where #{myRDSTableLastModifiedCol} >= '#{format(@scheduledStartTime, 'YYYY-MM-dd HH-mm-ss')}' and #{myRDSTableLastModifiedCol} <= '#{format(@scheduledEndTime, 'YYYY-MM-dd HH-mm-ss')}'"
        },
        {
            "name":"DestinationS3Location",
            "id":"DestinationS3Location",
            "type":"S3DataNode",
            "scheduleType": "TIMESERIES",
            "directoryPath":"#{myOutputS3Loc}/#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}"

        },
        {
            "name":"RDSToS3CopyActivity",
            "id":"RDSToS3CopyActivity",
            "type":"CopyActivity",
            "scheduleType": "TIMESERIES",
            "input":{
                "ref":"SourceRDSTable"
            },
            "output":{
                "ref":"DestinationS3Location"
            },
            "runsOn":{
                "ref":"Ec2Instance"
            }
        },
        {
            "name":"Ec2Instance",
            "id":"Ec2Instance",
            "type":"Ec2Resource",
            "scheduleType": "TIMESERIES",
            "instanceType":"#{myEC2InstanceType}",
            "securityGroups":"#{myEc2RdsSecurityGrps}",
            "terminateAfter":"2 hours",
            "actionOnTaskFailure":"terminate"
        }
    ],
        "parameters":[
        {
            "id":"myRDSInstanceId",
            "type":"String",
            "description":"RDS Oracle my_db_instance_identifier"
        },
        {
            "id":"myOracleJdbcDriverUri",
            "type":"String",
            "description":"S3 path of Oracle Jdbc Driver."
        },
        {
            "id":"myRDSUsername",
            "type":"String",
            "description":"RDS username"
        },
        {
            "id":"*myRDSPassword",
            "type":"String",
            "description":"RDS  password"
        },
        {
            "id":"myRDSTableName",
            "type":"String",
            "description":"RDS  table name"
        },
        { 
            "id": "myEc2RdsSecurityGrps",
            "type":"String",
            "isArray": "true",
            "description": "RDS security group(s)",
            "optional" :"true",
            "helpText" :"The names of one or more EC2 security groups that have access to the RDS cluster.",
            "watermark": "security group name"
        },
        {
            "id":"myRDSTableLastModifiedCol",
            "type":"String",
            "description":"Last modified column name",
            "helpText": "Name of the column that stores the last modified time value in the RDS table."
        },
        {
            "id":"myEC2InstanceType",
            "type":"String",
            "default":"t1.micro",
            "description":"EC2 instance type",
            "helpText": "The type of the EC2 instance that will be launched on your behalf to do the copy"
        },
        {
            "id":"myOutputS3Loc",
            "type":"AWS::S3::ObjectKey",
            "description":"Output S3 folder"
        }
    ]
}

Answer 2

获得S3中的数据（特别是标准存储）后，您可以使用S3生命周期策略将对象转换为成本较低的存储选项。

有关详细信息，请参阅此处的文档：http://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html

Answer 3

Cloudtechnician - 您也可以使用AWS打开案例。我们想听听上面的文档是否足够，如果不是我们如何澄清它以更好地帮助您。

请使用此stackoverflow网址表明您自己，以便我们提供参考。

Answer 4

对于特殊的全表数据转储到S3（非增量），您可以生成EC2 Windows实例并在其上运行Oracle_To_S3_Data_Uploader。

触摸不足。您必须提供SQL查询和目标S3存储桶名称。

将使用AWS多部分上传协议压缩数据并将其流式传输到S3。

如何将数据从AWS Oracle RDS传输到S3（然后是Glacier）？

4 个答案: