我在S3存储桶中具有以下JSON数据:
{
"campaigns": [
{"campaign_reach": 123456,
"campaign_spend": 123456.0,
"campaign_goal": 12345678,
"id": "cda05a432b3b44c18c009a4a961f644a",
"campaign_name": "Campaign1",
"publisher_name": "PublisherA",
"campaign_impressions": 123456}],
"line_items": [],
"podcasts": [
{"podcast_name": "PodcastA", "id": "86edbca2dc644ba8960c8f4bd55bdc19"},
{"podcast_name": "PodcastB", "id": "fc3f2dc4c20949edaaf2186613ec7e47"}]
}
我正在使用COPY将“广告系列”部分加载到Redshift中的表中。
我尝试使用jsonpaths加载
query_copy = """copy myschema.campaigns
from 's3://mybucket/mapping.json'
credentials 'aws_access_key_id=""" + acc + """;aws_secret_access_key=""" + sh + """'
json 's3://mybucket/campaign_jsonpaths.json'
;"""
我的jsonpaths文件“ campaign_jsonpaths.json”:
{
"jsonpaths": [
"$['id']",
"$['campaign_name']",
"$['campaign_reach'][0]",
"$['campaign_spend']",
"$['campaign_goal']",
"$['campaign_impressions']",
"$['publisher_name']",
]
}
我也尝试过使用json'auto':
query_copy = """copy myschema.campaigns
from 's3://mybucket/mapping.json'
credentials 'aws_access_key_id=""" + acc + """;aws_secret_access_key=""" + sh + """'
json 'auto’
;"""
都可以成功运行,但是Redshift中的表为空。 stl_load_errors中没有错误。
我在这里找到了类似的帖子,但没有提供答案: Redshift: copy command Json data from s3
任何帮助将不胜感激。
答案 0 :(得分:0)
通过执行以下操作,我能够成功加载该表:
根据您的JSON数据创建的广告系列表:
create table campaigns
(
id varchar(100),
campaign_name varchar(100),
campaign_reach int,
campaign_spend float,
campaign_goal int,
campaign_impressions int,
publisher_name varchar(100)
);
使用您的JSON数据创建了mapping.json文件
创建如下的campaigns_jsonpaths.json:
{
"jsonpaths": [
"$['campaigns'][0]['id']",
"$['campaigns'][0]['campaign_name']",
"$['campaigns'][0]['campaign_reach']",
"$['campaigns'][0]['campaign_spend']",
"$['campaigns'][0]['campaign_goal']",
"$['campaigns'][0]['campaign_impressions']",
"$['campaigns'][0]['publisher_name']"
]
}
Ran副本:
copy campaigns
from 's3://<bucket>/mapping.json'
iam_role 'arn:aws:iam::1234567890:role/Redshift-Role'
json 's3://<bucket>/campaigns_jsonpaths.json'
;
记录已成功加载到广告系列表中。