AWS数据管道-RedshiftCopyActvity-Redshift到S3-设置为管道定界

时间:2019-01-21 17:11:31

标签: amazon-s3 amazon-redshift amazon-data-pipeline

我已经建立了一个数据管道,每四个小时将一个表转储到S3文件中。一切都按预期进行。唯一的问题是,该文件用于加载另一个redshift表,该表正在寻找要填充的管道定界S3文件。我无权访问此表的Serde参数,因此需要强制将输出S3文件用Pipe(“ |”)分隔。

以下是用于创建管道的JSON:

{
  "objects": [
    {
      "subnetId": "subnet",
      "resourceRole": "DefaultResourceRole",
      "role": "DefaultRole",
      "securityGroupIds": "someid",
      "instanceType": "m3.large",
      "name": "DefaultResource",
      "keyPair": "blah",
      "id": "ResourceId",
      "type": "Ec2Resource",
      "terminateAfter": "15 Minutes"
    },
    {
      "databaseName": "blah",
      "*password": "blahwithsymbols",
      "name": "DefaultDatabase",
      "id": "DatabaseId",
      "clusterId": "production",
      "type": "RedshiftDatabase",
      "username": "blahblah"
    },
    {
      "period": "4 Hours",
      "name": "Every 4 hours",
      "id": "DefaultSchedule",
      "type": "Schedule",
      "startAt": "FIRST_ACTIVATION_DATE_TIME"
    },
    {
      "failureAndRerunMode": "CASCADE",
      "schedule": {
        "ref": "DefaultSchedule"
      },
      "resourceRole": "ResourceRole",
      "role": "Role",
      "pipelineLogUri": "s3://logs.blah.blah",
      "scheduleType": "cron",
      "name": "Default",
      "id": "Default"
    },  
    {
      "output": {
        "ref": "S3DataNodeId"
      },
      "input": {
        "ref": "RedshiftDataNodeId"
      },
      "schedule": {
        "ref": "DefaultSchedule"
      },
      "onSuccess": {
        "ref": "SuccessNotify"
      },
      "onFail": {
        "ref": "FailureNotify"
      },
      "name": "Copy",
      "id": "RedshiftCopyActivityId",
      "runsOn": {
        "ref": "ResourceId"
      },
      "type": "RedshiftCopyActivity",
      "insertMode": "TRUNCATE"      
    },
    {
      "schedule": {
        "ref": "DefaultSchedule"
      },
      "database": {
        "ref": "DatabaseId"
      },
      "name": "DefaultRedshiftDataNode",
      "id": "RedshiftDataNodeId",
      "type": "RedshiftDataNode",
      "tableName": "blah"
    },
    {
      "schedule": {
        "ref": "DefaultSchedule"
      },
      "directoryPath": "s3://blah.com/blah/#{format(@scheduledStartTime,'YYYY')}/#{month(@scheduledStartTime)}",
      "name": "DefaultS3DataNode",
      "id": "S3DataNodeId",
      "type": "S3DataNode"
    },
    {
      "subject": "Load SUCCESS: #{node.@scheduledStartTime}",
      "name": "Load Success SNS",
      "id": "SuccessNotify",
      "message": "Successfully ran RedshiftCopyActivity #{node.name}.",
      "type": "SnsAlarm",
      "topicArn": "-"
    },  
    {
      "subject": "Load FAILURE: #{node.@scheduledStartTime}",
      "name": "Failed SNS",
      "id": "FailureNotify",
      "message": "FAILED: RedshiftCopyActivity #{node.name}.",
      "type": "SnsAlarm",
      "topicArn": "blah"
    }  
  ],
  "parameters": []
}

我需要为定界符创建一个“数据格式”部分,还是为s3节点创建类似“定界符”的标记?

谢谢。

0 个答案:

没有答案