我承认这个问题的标题不明确。如果有人在阅读我的问题后可以改写它,那就太棒了。
无论如何,我有一对字段ID的字段。现在我想用他们的文字替换它们。现在我正在进行两次加入和预告,如下所示:
WordIDs = LOAD wordID.txt AS (wordID1:long, wordID2:long);
WordTexts = LOAD wordText.txt AS (wordID:long, wordText:chararray);
Join1 = JOIN WordIDs BY wordID1, WordTexts BY wordID;
Replaced1 = FOREACH Join1 GENERATE WordTexts::wordText As wordText1, WordIDs::wordID2;
Join2 = JOIN Replaced1 BY wordID2, WordTexts BY wordID;
Replaced2 = FOREACH Join2 GENERATE Replaced1::wordText1 As wordText1, WordTexts::wordText::wordText2;
有没有办法用较少数量的语句(比如一个连接而不是两个连接)来执行此操作?
答案 0 :(得分:1)
我认为您当前的代码将生成2个单独的map reduce作业,以避免它使用复制的join,它不会改变join语句的数量,但只使用一个map side join,只有一个map reduce作业。代码看起来应该是这样的(我还没有运行它):
WordIDs = LOAD wordID.txt AS (wordID1:long, wordID2:long);
WordTexts = LOAD wordText.txt AS (wordID:long, wordText:chararray);
Join1 = JOIN WordIDs BY wordID1, WordTexts BY wordID USING 'replicated';
Join2 = JOIN Join1 BY wordID2, WordTexts BY wordID USING 'replicated';
Replaced = FOREACH Join2 GENERATE Join1::WordTexts::wordText As wordText1, Join2::wordTexts::wordText as wordText2;