Question

我试图根据文件中的一个键列将一个大文件（15GB）拆分成多个小文件。如果我在几千行上运行它，相同的代码工作正常。

我的代码如下。

REGISTER /home/auto/ssachi/piggybank-0.16.0.jar;
input_dt = LOAD '/user/ssachi/sywr_sls_ln_ofr_dtl/sywr_sls_ln_ofr_dtl.txt-10' USING PigStorage(',');
STORE input_dt into '/user/rahire/sywr_sls_ln_ofr_dtl_split' USING org.apache.pig.piggybank.storage.MultiStorage('/user/rahire/sywr_sls_ln_ofr_dtl_split','4','gz',',');

错误如下

ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 6015: During execution, encountered a Hadoop error.

HadoopVersion 2.6.0-cdh5.8.2
PigVersion 0.12.0-cdh5.8.2

我尝试设置以下参数，假设它是一个内存问题，但它没有帮助。

SET mapreduce.map.memory.mb 16000;
SET mapreduce.map.java.opts 14400;

设置了上述参数后，我得到了以下错误。

Container exited with a non-zero exit code 1

org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: AttemptID:attempt_1486048646102_2613_m_000066_3 Info:Exception from container-launch.

Answer 1

你＆＃34;的基数是什么？关键栏＆＃34;它在1000？

如果它在1000中，那么当你的Mappers因为OOME而死亡时你会得到错误。

理解每个Mapper现在为每个filePointer维护1000个文件指针和一个相关的缓冲区，足以占用整个堆。

请您提供您的地图制作者的日志以供进一步调查

MapReduce中的多输出，内部调用。 http://bytepadding.com/big-data/map-reduce/multipleoutputs-in-map-reduce/

处理大文件时pig脚本出错

1 个答案: