我已经建立了一个数据管道,每四个小时将一个表转储到S3文件中。一切都按预期进行。唯一的问题是,该文件用于加载另一个redshift表,该表正在寻找要填充的管道定界S3文件。我无权访问此表的Serde参数,因此需要强制将输出S3文件用Pipe(“ |”)分隔。
以下是用于创建管道的JSON:
{
"objects": [
{
"subnetId": "subnet",
"resourceRole": "DefaultResourceRole",
"role": "DefaultRole",
"securityGroupIds": "someid",
"instanceType": "m3.large",
"name": "DefaultResource",
"keyPair": "blah",
"id": "ResourceId",
"type": "Ec2Resource",
"terminateAfter": "15 Minutes"
},
{
"databaseName": "blah",
"*password": "blahwithsymbols",
"name": "DefaultDatabase",
"id": "DatabaseId",
"clusterId": "production",
"type": "RedshiftDatabase",
"username": "blahblah"
},
{
"period": "4 Hours",
"name": "Every 4 hours",
"id": "DefaultSchedule",
"type": "Schedule",
"startAt": "FIRST_ACTIVATION_DATE_TIME"
},
{
"failureAndRerunMode": "CASCADE",
"schedule": {
"ref": "DefaultSchedule"
},
"resourceRole": "ResourceRole",
"role": "Role",
"pipelineLogUri": "s3://logs.blah.blah",
"scheduleType": "cron",
"name": "Default",
"id": "Default"
},
{
"output": {
"ref": "S3DataNodeId"
},
"input": {
"ref": "RedshiftDataNodeId"
},
"schedule": {
"ref": "DefaultSchedule"
},
"onSuccess": {
"ref": "SuccessNotify"
},
"onFail": {
"ref": "FailureNotify"
},
"name": "Copy",
"id": "RedshiftCopyActivityId",
"runsOn": {
"ref": "ResourceId"
},
"type": "RedshiftCopyActivity",
"insertMode": "TRUNCATE"
},
{
"schedule": {
"ref": "DefaultSchedule"
},
"database": {
"ref": "DatabaseId"
},
"name": "DefaultRedshiftDataNode",
"id": "RedshiftDataNodeId",
"type": "RedshiftDataNode",
"tableName": "blah"
},
{
"schedule": {
"ref": "DefaultSchedule"
},
"directoryPath": "s3://blah.com/blah/#{format(@scheduledStartTime,'YYYY')}/#{month(@scheduledStartTime)}",
"name": "DefaultS3DataNode",
"id": "S3DataNodeId",
"type": "S3DataNode"
},
{
"subject": "Load SUCCESS: #{node.@scheduledStartTime}",
"name": "Load Success SNS",
"id": "SuccessNotify",
"message": "Successfully ran RedshiftCopyActivity #{node.name}.",
"type": "SnsAlarm",
"topicArn": "-"
},
{
"subject": "Load FAILURE: #{node.@scheduledStartTime}",
"name": "Failed SNS",
"id": "FailureNotify",
"message": "FAILED: RedshiftCopyActivity #{node.name}.",
"type": "SnsAlarm",
"topicArn": "blah"
}
],
"parameters": []
}
我需要为定界符创建一个“数据格式”部分,还是为s3节点创建类似“定界符”的标记?
谢谢。