我有一些json数据格式由secor
以SequenceFile格式保存到S3。我想用Pig分析它。使用elephant-bird
我设法以bytearray
格式从S3获取它,但我无法将其转换为chararray
,显然需要解析Json:
%declare SEQFILE_LOADER 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
%declare LONG_CONVERTER 'com.twitter.elephantbird.pig.util.LongWritableConverter';
%declare BYTES_CONVERTER 'com.twitter.elephantbird.pig.util.BytesWritableConverter';
%declare TEXT_CONVERTER 'com.twitter.elephantbird.pig.util.TextConverter';
grunt> A = LOAD 's3n://...logs/raw_logs/...events/dt=2015-12-08/1_0_00000000000085594299'
USING $SEQFILE_LOADER ('-c $LONG_CONVERTER', '-c $BYTES_CONVERTER')
AS (key: long, value: bytearray);
grunt> B = LIMIT A 1;
grunt> DUMP B;
(85653965,{"key": "val1", other json data, ...})
grunt> DESCRIBE B;
B: {key: long,value: bytearray}
grunt> C = FOREACH B GENERATE (key, (chararray)value);
grunt> DUMP C;
2015-12-08 19:32:09,133 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1075: Received a bytearray from the UDF or Union from two different Loaders.
Cannot determine how to convert the bytearray to string.
使用TextConverter
的{{1}}内容只留下空值,例如:
BytesWritableConverter
很明显Pig能够将字节数组转换为字符串来转储它,因此它看起来不应该是不可能的。我该怎么做?