尝试将大型TPCH表数据从S3复制到我的AWS RDS实例时,我收到以下错误:
An error occurred when executing the SQL command:
copy orders from 's3://aqa-pat/tpchData/tpchRawData/100G/orders.tbl'
credentials 'aws_iam_role=arn:aws:iam::183326689449:role/RedshiftRole'
delimiter ...
[Amazon](600000) Error setting/closing connection: Connection reset by peer. [SQL State=HY000, DB Errorcode=600000]
1 statement failed.
Execution time: 15m 3s
我正在运行TPCH基准测试并使用TPC的dbgen工具创建了单独的表文件,并且仅在上传订单和lineitems表时收到此连接错误。其他表已加载。
我该怎么做才能解决这个问题?将我的实例的区域更改为S3存储区的区域?
答案 0 :(得分:1)
为清楚起见,我正在将100GB的数据加载到表中。我假设RedShift能够处理这个负载;但又恢复了分片。
我已经根据AWS文档将我的表分区为JSON清单,从而解决了这个问题。
以下是清单的样子:
Copy
{
"entries": [
{"url":"s3://mybucket-alpha/2013-10-04-custdata", "mandatory":true},
{"url":"s3://mybucket-alpha/2013-10-05-custdata", "mandatory":true},
{"url":"s3://mybucket-beta/2013-10-04-custdata", "mandatory":true},
{"url":"s3://mybucket-beta/2013-10-05-custdata", "mandatory":true}
]
}
以下是如何将所述清单导入红移:
copy customer
from 's3://mybucket/cust.manifest'
iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
manifest;
以下是其他有类似问题的文档:
http://docs.aws.amazon.com/redshift/latest/dg/loading-data-files-using-manifest.html