将JSON从S3加载到Redshift

时间:2020-05-18 19:15:35

标签: json amazon-s3 amazon-redshift

我在S3存储桶中具有以下JSON数据:

{
"campaigns": [
{"campaign_reach": 123456, 
"campaign_spend": 123456.0, 
"campaign_goal": 12345678, 
"id": "cda05a432b3b44c18c009a4a961f644a", 
"campaign_name": "Campaign1", 
"publisher_name": "PublisherA", 
"campaign_impressions": 123456}], 
"line_items": [], 
"podcasts": [
{"podcast_name": "PodcastA", "id": "86edbca2dc644ba8960c8f4bd55bdc19"}, 
{"podcast_name": "PodcastB", "id": "fc3f2dc4c20949edaaf2186613ec7e47"}]
}

我正在使用COPY将“广告系列”部分加载到Redshift中的表中。

我尝试使用jsonpaths加载

query_copy = """copy myschema.campaigns
from 's3://mybucket/mapping.json'
credentials 'aws_access_key_id=""" + acc + """;aws_secret_access_key=""" + sh + """'
json 's3://mybucket/campaign_jsonpaths.json'
;"""

我的jsonpaths文件“ campaign_jsonpaths.json”:

{
    "jsonpaths": [
        "$['id']",
        "$['campaign_name']",
        "$['campaign_reach'][0]",
        "$['campaign_spend']",
        "$['campaign_goal']",
        "$['campaign_impressions']",
        "$['publisher_name']",
    ]
}

我也尝试过使用json'auto':

query_copy = """copy myschema.campaigns
from 's3://mybucket/mapping.json'
credentials 'aws_access_key_id=""" + acc + """;aws_secret_access_key=""" + sh + """'
json 'auto’
;"""

都可以成功运行,但是Redshift中的表为空。 stl_load_errors中没有错误。

我在这里找到了类似的帖子,但没有提供答案: Redshift: copy command Json data from s3

任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:0)

通过执行以下操作,我能够成功加载该表:

  1. 根据您的JSON数据创建的广告系列表:

    create table campaigns ( id varchar(100), campaign_name varchar(100), campaign_reach int, campaign_spend float, campaign_goal int, campaign_impressions int, publisher_name varchar(100) );

  2. 使用您的JSON数据创建了mapping.json文件

  3. 创建如下的campaigns_jsonpaths.json:

    { "jsonpaths": [ "$['campaigns'][0]['id']", "$['campaigns'][0]['campaign_name']", "$['campaigns'][0]['campaign_reach']", "$['campaigns'][0]['campaign_spend']", "$['campaigns'][0]['campaign_goal']", "$['campaigns'][0]['campaign_impressions']", "$['campaigns'][0]['publisher_name']" ] }

  4. Ran副本:

    copy campaigns from 's3://<bucket>/mapping.json' iam_role 'arn:aws:iam::1234567890:role/Redshift-Role' json 's3://<bucket>/campaigns_jsonpaths.json' ;

记录已成功加载到广告系列表中。