我想要加入两个表。 table1有 id 和 value 列 table2有 id 和颜色列。
=IFERROR("This Equals " & SUBSTITUTE(INDEX($G:$G,AGGREGATE(15,6,ROW($G$1:$G$4)/(($G$1:$G$4="#7")+($G$1:$G$4="#8")+($G$1:$G$4="#9")),ROW(1:1))),"#",""),"")
我收到的表格的列 id ,值, id ,颜色。但我希望获得一个包含 id ,值和颜色等列的表格。如何从此表中删除此重复的id列?
答案 0 :(得分:0)
如果你DESCRIBE final;
,你会发现架构看起来像这样:
final: {table1::id: chararray,table1::value: chararray,table2::id: chararray,table2::color: chararray}
要区分这两个ID列,您可以使用table1::id
或table2::id
。因此,要删除其中一个重复列,您可以执行以下操作:
A = FOREACH final GENERATE
table1::id AS id,
table1::value AS value,
table2::color AS color;
(我还重新命名了字段以删除table1::
和table2::
前缀,因为它们不再需要。)
我本可以做到:
A = FOREACH final GENERATE
table1::id AS id,
value AS value,
color AS color;
这不会给我一个错误,因为value
和color
是明确的名称。
答案 1 :(得分:0)
执行最终的PIG脚本:
grunt> table1 = LOAD 'table1_input_path' USING PigStorage(',') as (id:int, value:int);
grunt> table2= LOAD 'table2_input_path' USING PigStorage(',') as (id:int, color:chararray);
grunt> joinlevel = JOIN table1 BY id, table2 BY id;
grunt> final = FOREACH joinlevel generate table1::id as id, table1::color as color, table2::value as value;
grunt> dump final;