Pigstorage用猪脚本读取压缩文件

时间:2016-07-16 00:02:57

标签: zip apache-pig ziparchive

我有一个程序将转储分隔的数据文件转储到S3。

我有一个Pig脚本,可以从S3存储桶加载数据。我在文件名中指定了.zip扩展名,以便Pig了解所使用的压缩。

pig脚本运行并将数据转储回S3。

日志显示它正在处理记录,但转储的文件都是空的。

以下是日志摘录

Input(s):
Successfully read 375 records (435 bytes) from: "s3://<bucket-name>/<job-id>/test-folder/filename1.zip"
Successfully read 444 records (442 bytes) from: "s3://<bucket-name>/<job-id>/test-folder/filename2.zip"

Output(s):
Successfully stored 375 records (1605 bytes) in: "s3://<bucket-name>/<job-id>/test-folder/output/output1-folder"
Successfully stored 444 records (1814 bytes) in: "s3://<bucket-name>/<job-id>/test-folder/output/output2-folder"
Successfully stored 0 records in: "s3://<bucket-name>/<job-id>/test-folder/output/output3-folder"

加载和存储数据的代码是:

data1 = load '$input1'
    using PigStorage('\t') as
    (field1:long,
     field2:long,
     field3:double
);

data2 = load '$input2'
    using PigStorage('\t') as
    (field1:long,
     field2:long,
     field3:double
);

store output1 into '$output1-folder'
    using PigStorage('\t', '-schema');

store output2 into '$output2-folder'
    using PigStorage('\t', '-schema');

store output3 into '$output3-folder'
    using PigStorage('\t', '-schema');

压缩文件的代码

public static void compressFile(String originalArchive, String zipArchive) throws IOException {
    try (
            ZipOutputStream archive = new ZipOutputStream(new FileOutputStream(zipArchive));
            FileInputStream file    = new FileInputStream(originalArchive);
    ) {
        final int bufferSize = 100 * 1024;
        byte[] buffer = new byte[bufferSize];

        archive.putNextEntry(new ZipEntry(zipArchive));

        int count = 0;
        while ((count = file.read(buffer)) != -1) {
                archive.write(buffer, 0, count);
        }
        file.close();
        archive.closeEntry();
        archive.close();

    }
}

感谢任何帮助!

谢谢!

0 个答案:

没有答案