如何在hadoop中使用Snappy压缩和解压缩

时间:2015-02-21 18:44:12

标签: java hadoop snappy

我正在使用以下代码进行压缩

     Configuration conf = new Configuration(); 
    conf.setBoolean("mapred.compress.map.output", true); 
conf.set("mapred.map.output.compression.codec","org.apache.hadoop.io.compress.SnappyCodec");

使用snappy算法。但是在用一些数据(70到100 mb)压缩输入文件时,它会压缩文件大小超过输入文件的数据,如果我尝试使用包含所有类型文件的输入目录,例如(.jpg,.mp3) ,.mp4等..)大小为100到150 mb,它显示错误:

log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Java HotSpot(TM) Server VM warning: INFO: os::commit_memory(0x930c0000, 105119744, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 105119744 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /home/hduser/workspace/TestProject/hs_err_pid16619.log

当我尝试使用snappy算法压缩和解压缩数据时,请在此建议我,如何使用sanppy算法压缩数据,空间更小。

我正在使用

Ubuntu 13.10,32位 Jdk 7 32位。 与hadoop-2.2.0

0 个答案:

没有答案