我有一个程序将转储分隔的数据文件转储到S3。
我有一个Pig脚本,可以从S3存储桶加载数据。我在文件名中指定了.zip扩展名,以便Pig了解所使用的压缩。
pig脚本运行并将数据转储回S3。
日志显示它正在处理记录,但转储的文件都是空的。
以下是日志摘录
Input(s):
Successfully read 375 records (435 bytes) from: "s3://<bucket-name>/<job-id>/test-folder/filename1.zip"
Successfully read 444 records (442 bytes) from: "s3://<bucket-name>/<job-id>/test-folder/filename2.zip"
Output(s):
Successfully stored 375 records (1605 bytes) in: "s3://<bucket-name>/<job-id>/test-folder/output/output1-folder"
Successfully stored 444 records (1814 bytes) in: "s3://<bucket-name>/<job-id>/test-folder/output/output2-folder"
Successfully stored 0 records in: "s3://<bucket-name>/<job-id>/test-folder/output/output3-folder"
加载和存储数据的代码是:
data1 = load '$input1'
using PigStorage('\t') as
(field1:long,
field2:long,
field3:double
);
data2 = load '$input2'
using PigStorage('\t') as
(field1:long,
field2:long,
field3:double
);
store output1 into '$output1-folder'
using PigStorage('\t', '-schema');
store output2 into '$output2-folder'
using PigStorage('\t', '-schema');
store output3 into '$output3-folder'
using PigStorage('\t', '-schema');
压缩文件的代码
public static void compressFile(String originalArchive, String zipArchive) throws IOException {
try (
ZipOutputStream archive = new ZipOutputStream(new FileOutputStream(zipArchive));
FileInputStream file = new FileInputStream(originalArchive);
) {
final int bufferSize = 100 * 1024;
byte[] buffer = new byte[bufferSize];
archive.putNextEntry(new ZipEntry(zipArchive));
int count = 0;
while ((count = file.read(buffer)) != -1) {
archive.write(buffer, 0, count);
}
file.close();
archive.closeEntry();
archive.close();
}
}
感谢任何帮助!
谢谢!