Apache Pig Error - 为什么它不接受chararray中的所有列

时间:2016-06-27 14:19:47

标签: apache-pig hdfs

我有以下猪脚本:

   I'm trying with this:

Source_Data = LOAD'/ user / cloudera / Source_Data /'使用PigStorage('\ t',' - tagFile'); Data_Schema = FOREACH Source_Data GENERATE( (chararray)$ 1 AS日期, (chararray)$ 2 AS ID, (chararray)$ 3 AS Interval, (chararray)$ 4 AS Code, (chararray)$ 5 AS S_In_Activity, (chararray)$ 6 AS S_Out_Activity, (chararray)$ 7 AS C_In_Activity, (chararray)$ 8 AS C_Out_Activity, (chararray)$ 9 AS Traffic_Activity); STORE Data_Schema INTO'/ user / cloudera / Source_Data / New_Data /'使用PigStorage('\ t');

这是我的一行源数据:

  

11300 1387926000000 76 1.8190562337403677 0.9613115354827483 330.0372865843317554633 0.1161754442265068633 11.04195619825027733

但是当我执行代码时我遇到错误但是如果我删除最后一部分来定义架构它会成功地给我。请注意,第一列是由Pig Statement插入的。

1 个答案:

答案 0 :(得分:0)

你基本上在最后一句话中回答了自己的问题。使用STORE运算符时,不能声明模式。根据{{​​3}}:

STORE alias INTO 'directory' [USING function];

在你的情况下,它将是简单的:

Data = LOAD '/user/cloudera/Source' using PigStorage('\t','-tagFile'); 

Data_prestage = FOREACH Data GENERATE (
(chararray)$1 AS Filename, 
(chararray)$2 AS CCode, 
(chararray)$3 AS SCode, 
(chararray)$4 AS In_Act,
(chararray)$5 AS Out_Act,
(chararray)$6 AS In_Act1;

STORE Data_prestage INTO '/user/cloudera/Source/Data2/' USING PigStorage('\t');

此外,如果您不打算对数据进行任何操作,您可能会考虑使用STREAM。