我想使用AWS数据管道服务将数据从Oracle RDS数据库传输到s3,然后传输到Glacier。有人可以告诉我如何实现这一目标。
答案 0 :(得分:3)
您可以设置要复制的AWS DataPipeline,以将RDS Oracle表的每日增量副本执行到S3。将其移动到S3存储桶后,您可以将其存档到Glacier。 https://aws.amazon.com/blogs/aws/archive-s3-to-glacier/
您需要将从http://www.oracle.com/technetwork/database/features/jdbc/jdbc-drivers-12c-download-1958347.html下载的Oracle jdbc驱动程序上传到S3存储桶位置,并使用jdbcDriverJarUri字段指定s3路径。
请参阅此http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-rdsdatabase.html。
这是一个示例DataPipeline模板,用于调度Amazon EC2实例以执行从Amazon RDS Oracle表到Amazon S3的增量数据复制。 RDS Oracle表必须具有存储上次修改时间值的列。此模板将复制从计划开始时间开始的计划间隔之间对表进行的更改。不会复制对表的物理删除。输出将作为CSV文件写入输出S3文件夹下的带时间戳的子文件夹中。
{
"metadata": {
"templateName": "Incremental copy of RDS Oracle table to S3",
"templateDescription": "Incremental copy of RDS Oracle table to S3"
},
"objects":[
{
"name": "DailySchedule",
"id": "DailySchedule",
"startAt" : "FIRST_ACTIVATION_DATE_TIME",
"period": "1 hour",
"type": "Schedule"
},
{
"id": "Default",
"name": "Default",
"schedule": {
"ref": "DailySchedule"
},
"failureAndRerunMode": "CASCADE",
"role": "DataPipelineDefaultRole",
"resourceRole": "DataPipelineDefaultResourceRole"
},
{
"name":"SourceRDSTable",
"id":"SourceRDSTable",
"type":"RdsDatabase",
"table":"#{myRDSTableName}",
"username":"#{myRDSUsername}",
"*password":"#{*myRDSPassword}",
"jdbcDriverJarUri" : "#{myOracleJdbcDriverUri}",
"rdsInstanceId":"#{myRDSInstanceId}",
"scheduleType": "TIMESERIES",
"selectQuery":"select * from #{table} where #{myRDSTableLastModifiedCol} >= '#{format(@scheduledStartTime, 'YYYY-MM-dd HH-mm-ss')}' and #{myRDSTableLastModifiedCol} <= '#{format(@scheduledEndTime, 'YYYY-MM-dd HH-mm-ss')}'"
},
{
"name":"DestinationS3Location",
"id":"DestinationS3Location",
"type":"S3DataNode",
"scheduleType": "TIMESERIES",
"directoryPath":"#{myOutputS3Loc}/#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}"
},
{
"name":"RDSToS3CopyActivity",
"id":"RDSToS3CopyActivity",
"type":"CopyActivity",
"scheduleType": "TIMESERIES",
"input":{
"ref":"SourceRDSTable"
},
"output":{
"ref":"DestinationS3Location"
},
"runsOn":{
"ref":"Ec2Instance"
}
},
{
"name":"Ec2Instance",
"id":"Ec2Instance",
"type":"Ec2Resource",
"scheduleType": "TIMESERIES",
"instanceType":"#{myEC2InstanceType}",
"securityGroups":"#{myEc2RdsSecurityGrps}",
"terminateAfter":"2 hours",
"actionOnTaskFailure":"terminate"
}
],
"parameters":[
{
"id":"myRDSInstanceId",
"type":"String",
"description":"RDS Oracle my_db_instance_identifier"
},
{
"id":"myOracleJdbcDriverUri",
"type":"String",
"description":"S3 path of Oracle Jdbc Driver."
},
{
"id":"myRDSUsername",
"type":"String",
"description":"RDS username"
},
{
"id":"*myRDSPassword",
"type":"String",
"description":"RDS password"
},
{
"id":"myRDSTableName",
"type":"String",
"description":"RDS table name"
},
{
"id": "myEc2RdsSecurityGrps",
"type":"String",
"isArray": "true",
"description": "RDS security group(s)",
"optional" :"true",
"helpText" :"The names of one or more EC2 security groups that have access to the RDS cluster.",
"watermark": "security group name"
},
{
"id":"myRDSTableLastModifiedCol",
"type":"String",
"description":"Last modified column name",
"helpText": "Name of the column that stores the last modified time value in the RDS table."
},
{
"id":"myEC2InstanceType",
"type":"String",
"default":"t1.micro",
"description":"EC2 instance type",
"helpText": "The type of the EC2 instance that will be launched on your behalf to do the copy"
},
{
"id":"myOutputS3Loc",
"type":"AWS::S3::ObjectKey",
"description":"Output S3 folder"
}
]
}
答案 1 :(得分:0)
获得S3中的数据(特别是标准存储)后,您可以使用S3生命周期策略将对象转换为成本较低的存储选项。
有关详细信息,请参阅此处的文档:http://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html
答案 2 :(得分:0)
Cloudtechnician - 您也可以使用AWS打开案例。我们想听听上面的文档是否足够,如果不是我们如何澄清它以更好地帮助您。
请使用此stackoverflow网址表明您自己,以便我们提供参考。
答案 3 :(得分:0)
对于特殊的全表数据转储到S3(非增量),您可以生成EC2 Windows实例并在其上运行Oracle_To_S3_Data_Uploader。
触摸不足。您必须提供SQL查询和目标S3存储桶名称。
将使用AWS多部分上传协议压缩数据并将其流式传输到S3。