有没有办法从一个S3 JSON文件加载多个表Redshift并忽略那些与我的JSON路径不完全匹配的对象?
我在Redshift中有以下两个表:
create table employee
(
employee_id int,
employee_name varchar(50),
employee_age int,
department_id int,
created_date timestamp
);
create table department
(
department_id int,
department_name varchar(50),
created_date timestamp
);
我的JSON文件" employee_department.json"(包含员工和部门数据):
{
"employee_id": 1,
"employee_name": "Jack",
"employee_age ": 38,
"department_id": 100,
"created_date": "2016-11-07 10:00:00"
}
{
"employee_id": 2,
"employee_name": "Jill",
"employee_age ": 26,
"department_id": 101,
"created_date": "2016-11-07 11:00:00"
}
{
"department_id ": 1,
"department_name ": "Sales",
"created_date": "2016-01-01 03:00:00"
}
{
"department_id ": 2,
"department_name ": "Finance",
"created_date": "2016-01-01 04:30:00"
}
清单文件" employee_department_manifest.manifest":
{
"entries": [
{"url":"s3://mybucket/employee_department.json","mandatory":true}
]
}
两个JSON路径文件" employee_jsonpath.json" &安培; " department_jsonpath.json":
{
"jsonpaths": [
"$['employee_id']",
"$['employee_name']",
"$['employee_age']",
"$['department_id']",
"$['created_date']"
]
}
{
"jsonpaths": [
"$['department_id']",
"$['department_name']",
"$['created_date']"
]
}
以下COPY命令可以使用,但是根据下面的屏幕截图,我最终在每个表中有4条记录而不是预期的2条,并且错误的department_id
数据被加载到departments表中。有没有办法将COPY限制为完全符合JSON路径文件的记录,而忽略其他文件?我意识到在这种情况下我可以将JSON分成两个单独的文件,但是知道这是否可能会很好。
COPY employee
FROM 's3://mybucket/employee_department_manifest.manifest'
CREDENTIALS ''
JSON 's3://mybucket/employee_jsonpath.json'
MANIFEST;
COPY department
FROM 's3://mybucket/employee_department_manifest.manifest'
CREDENTIALS ''
JSON 's3://mybucket/department_jsonpath.json'
MANIFEST;
select * from employee;
select * from department;