我是Pig的新手,我有两个数据集," highspender"和"反馈"。
Highspender:
Price,fname,lname
$50,Jack,Brown
$30,Rovin,Pall
的反馈:
date,Name,rate
2015-01-02,Jack B Brown,5
2015-01-02,Pall,4
现在我必须根据他们的名字加入这两个数据集。我的条件应该是fname
或Highspender的lname
应该与反馈的名称相匹配。如何加入这两个数据集?有什么想法吗?
答案 0 :(得分:0)
您可以尝试以下脚本执行相同操作,只需根据您的数据替换名称
highs = LOAD 'highs' using PigStorage(',') as (Price:chararray,fname:chararray,lname:chararray);
feedback = LOAD 'feeds' using PigStorage(',') as (date:chararray,Name:chararray,rate:chararray);
out = JOIN highs BY fname, feedback BY Name;
out1 = JOIN highs BY lname, feedback BY Name;
final_out = UNION out,out1;
如需进一步的帮助,请参阅此Pig Reference manual
修改强>
根据使用字符串函数连接数据的注释脚本如下所示:
highs = LOAD 'highs' using PigStorage(',') as (Price:chararray,fname:chararray,lname:chararray);
feedback = LOAD 'feeds' using PigStorage(',') as (date:chararray,Name:chararray,rate:chararray);
crossout = cross highs, feedback;
final_lname = filter crossout by ( REPLACE (feedback::Name,highs::lname ,'') != feedback::Name);
final_fname = filter crossout by ( REPLACE (feedback::Name,highs::fname ,'') != feedback::Name);
final = UNION final_lname, final_fname;