我将表存储为SequenceFile格式,我设置以下命令以启用带有BLOCK压缩的序列 -
set mapred.output.compress=true;
set mapred.output.compression.type=BLOCK;
set mapred.output.compression.codec=org.apache.hadoop.io.compress.LzoCodec;
但是当我尝试查看这样的表格时 -
describe extended lip_table
我得到以下信息,其中有一个名为compressed
的字段设置为false
,这意味着我的数据没有通过设置上述三个命令来压缩?
Detailed Table Information Table(tableName:lip_table, dbName:default, owner:uname,
createTime:1343931235, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:
[FieldSchema(name:buyer_id, type:bigint, comment:null), FieldSchema(name:total_chkout,
type:bigint, comment:null), FieldSchema(name:total_errpds, type:bigint, comment:null)],
location:hdfs://ares-nn/apps/hdmi/uname/lip-data,
inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat,
outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat,
**compressed:false**, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:
{serialization.format= , field.delim=
答案 0 :(得分:2)
我发现我认为this article可以解决您的问题。 您应该尝试在创建表或使用ALTER语句时在表定义级别指定压缩编解码器的用法。
创建时:
CREATE EXTERNAL TABLE lip_table (
column1 string
, column2 string
)
PARTITIONED BY (date string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
LOCATION '/path/to/hive/tables/lip';
使用ALTER(仅影响随后创建的分区):
ALTER TABLE lip_table
SET FILEFORMAT
INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
http://www.mrbalky.com/2011/02/24/hive-tables-partitions-and-lzo-compression/
答案 1 :(得分:1)
要避免serde
例外,也要使用serde
类。
ALTER TABLE <<table name>>
SET FILEFORMAT
INPUTFORMAT "<<Input format class>>"
OUTPUTFORMAT
"<<Output format class>>" SERDE "<<Serde class>>";