Pig:将bytearray解析为字符串/ json

时间:2015-12-08 19:36:15

标签: json hadoop apache-pig elephantbird

我有一些json数据格式由secor以SequenceFile格式保存到S3。我想用Pig分析它。使用elephant-bird我设法以bytearray格式从S3获取它,但我无法将其转换为chararray,显然需要解析Json:

%declare SEQFILE_LOADER 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
%declare LONG_CONVERTER 'com.twitter.elephantbird.pig.util.LongWritableConverter';
%declare BYTES_CONVERTER 'com.twitter.elephantbird.pig.util.BytesWritableConverter';
%declare TEXT_CONVERTER 'com.twitter.elephantbird.pig.util.TextConverter';

grunt> A = LOAD 's3n://...logs/raw_logs/...events/dt=2015-12-08/1_0_00000000000085594299'
       USING $SEQFILE_LOADER ('-c $LONG_CONVERTER', '-c $BYTES_CONVERTER')
       AS (key: long, value: bytearray);
grunt> B = LIMIT A 1;
grunt> DUMP B;

(85653965,{"key": "val1", other json data, ...})

grunt> DESCRIBE B;

B: {key: long,value: bytearray}

grunt> C = FOREACH B GENERATE (key, (chararray)value);
grunt> DUMP C;

2015-12-08 19:32:09,133 [main] ERROR org.apache.pig.tools.grunt.Grunt -
   ERROR 1075: Received a bytearray from the UDF or Union from two different Loaders.
   Cannot determine how to convert the bytearray to string.

使用TextConverter的{​​{1}}内容只留下空值,例如:

BytesWritableConverter

很明显Pig能够将字节数组转换为字符串来转储它,因此它看起来不应该是不可能的。我该怎么做?

0 个答案:

没有答案