Question

以下是可以正常使用的代码段：

Configuration conf = new Configuration();

//PROBLEM PART!!!!!
//conf.setBoolean("mapred.compress.map.output", true);
//conf.set("mapred.output.compression.type", "BLOCK");
//conf.setClass("mapred.map.output.compression.codec", GzipCodec.class, CompressionCodec.class);

Job job = new Job(conf, "WordCount");
job.setJarByClass(WordCount.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

job.setMapperClass(WordCountMap.class);
job.setReducerClass(Reduce.class);

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

job.waitForCompletion(true);

但如果我在上面的代码段中启用问题部分，则控制台的输出将停留在：

13/12/26 18:08:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/12/26 18:08:06 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/12/26 18:08:06 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
13/12/26 18:08:06 INFO input.FileInputFormat: Total input paths to process : 20
13/12/26 18:08:06 WARN snappy.LoadSnappy: Snappy native library not loaded
13/12/26 18:08:06 INFO mapred.JobClient: Running job: job_local1943436108_0001
13/12/26 18:08:06 INFO mapred.LocalJobRunner: Waiting for map tasks
13/12/26 18:08:06 INFO mapred.LocalJobRunner: Starting task: attempt_local1943436108_0001_m_000000_0
13/12/26 18:08:07 INFO util.ProcessTree: setsid exited with exit code 0
13/12/26 18:08:07 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@731d2572
13/12/26 18:08:07 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/jude/input/capacity-scheduler.xml:0+7457
13/12/26 18:08:07 INFO mapred.MapTask: io.sort.mb = 100
13/12/26 18:08:07 INFO mapred.MapTask: data buffer = 79691776/99614720
13/12/26 18:08:07 INFO mapred.MapTask: record buffer = 262144/327680
13/12/26 18:08:07 INFO mapred.MapTask: Starting flush of map output
13/12/26 18:08:07 INFO compress.CodecPool: Got brand-new compressor
13/12/26 18:08:07 INFO mapred.MapTask: Starting flush of map output
13/12/26 18:08:07 INFO mapred.JobClient:  map 0% reduce 0%
13/12/26 18:08:12 INFO mapred.LocalJobRunner: 
13/12/26 18:08:13 INFO mapred.JobClient:  map 5% reduce 0%
//no more

我只想压缩地图的输出，我的代码有什么问题吗？非常感谢！

Answer 1

使用压缩需要Hadoop为您的平台使用本机库，但显然您没有它们（或者没有正确配置库的路径）。这是解释问题的信息：

[...] NativeCodeLoader: Unable to load native-hadoop library for your platform...

可能的解决方案：

最常见的问题是在64位架构上使用32位库。您可以下载pre-compiled 64 native libraries OR compile them yourself using mvn package -Pdist,**native**,docs。
或者，您可能需要正确配置本机库的路径;请参阅以下有关如何执行此操作的其他问题：use -Djava.libray.path或LD_LIBRARY_PATH。

为什么hadoop在设置地图压缩属性后才卡在那里？

1 个答案: