尝试转储或存储时,在Pig脚本中出现Cast错误

时间:2015-05-22 04:57:26

标签: apache-pig

在PIG脚本中的两个数据集上创建连接后,我收到了强制转换错误。我使用的版本是HDP2.2 我得到的错误是:

ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 0: java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.String

我尝试DUMP或存储时遇到的错误。请指教。

我的脚本如下:

complaint= load 'file1' using PigStorage('|');
extracted = foreach complaint generate $13 as complainant_first_name:chararray, $14 as complainant_last_name:chararray, $16 as hic:chararray;
filtered_com = filter extracted by hic IS NOT NULL;

mbr= load 'file2' using PigStorage(',');
extracted = foreach mbr generate $11 as first_nm:chararray, $12 as last_nm:chararray, $24 as medcr_nbr:chararray;
filtered_mbr = filter extracted by medcr_nbr is not null;

joined = join filtered_com by hic, filtered_mbr by medcr_nbr;
describe joined;
store joined into 'com_mbr' using PigStorage(',') 

2 个答案:

答案 0 :(得分:1)

我们可以使用列数据类型

指定file1的加载
complaint= load 'file1' using PigStorage('|') as (col0:chararray,col1:chararray;.........)

我们可以在每个块中转换列数据类型

extracted = foreach complaint generate (chararray)$13 as complainant_first_name:chararray,
(chararray)$14 as complainant_last_name:chararray,(chararray)$16 as hic:chararray

同样可以为file2做同样的事情。 希望这会有所帮助!!

答案 1 :(得分:0)

您正在目睹的错误是:

*Caused by: java.lang.ClassCastException: org.apache.pig.data.DataByteArray incompatible with java.lang.String*

将数据加载到pig时默认为ByteArray Format。因此,要执行任何String操作,您需要将它们强制转换为chararray。

您可以通过在foreach语句中使用显式强制转换为chararray tpye来获取输出,或者只是将数据保留在bytearray中,如下所示:

complaint = LOAD'sofile1.txt' USING PigStorage('|'); // This loads all the data with bytearray is default data type.
extracted = FOREACH complaint GENERATE $0 AS(complaint_first_name,$1 AS(complaint_last_name),$2 as (hic);
filtered_com = filter extracted by hic IS NOT NULL;
mbr= load 'sofile2.txt' using PigStorage(',');
extracted = FOREACH mbr GENERATE $0 AS(first_nm),$1 AS (last_nm),$2 AS (medcr_nbr);
filtered_mbr = filter extracted by medcr_nbr is not null;
joined_data = JOIN filtered_com by hic,filtered_mbr by medcr_nbr;
describe joined;

这也应该打印结果。希望这会有所帮助。