Question

我在一些示例数据上使用散列连接来加入较大的一个小表。在此示例中，'_1080544_27_08_2016'是较大的表，'_2015_2016_playerlistlookup'是较小的表。这是我的代码：

data both(drop=rc);
 declare Hash Plan 
 (dataset: 'work._2015_2016_playerlistlookup');                             /* declare the name Plan for hash */
 rc = plan.DefineKey ('Player_ID');                                         /* identify fields to use as keys */
 rc = plan.DefineData ('Player_Full_Name', 
 'Player_First_Name', 'Player_Last_Name', 
 'Player_ID2');                                                                 /* identify fields to use as data */
 rc = plan.DefineDone ();                                                   /* complete hash table definition */
 do until (eof1) ;                                                          /* loop to read records from _1080544_27_08_2016 */
 set _1080544_27_08_2016 end = eof1;
 rc = plan.add ();                                                          /* add each record to the hash table */
 end;
 do until (eof2) ;                                                          /* loop to read records from _2015_2016_playerlistlookup */
 set _2015_2016_playerlistlookup end = eof2;
 call missing(Player_Full_Name, 
 Player_First_Name, Player_Last_Name);                                      /* initialize the variable we intend to fill */
 rc = plan.find ();                                                         /* lookup each plan_id in hash Plan */
 output;                                                                    /* write record to Both */
 end;
 stop;
run;

这产生的表与较小的查找表具有相同的行数。如果一个表与较大的表相同，并且查找表中的附加字段通过主键加入，我希望看到。

较大的表具有重复的主键。也就是说主键不是唯一的（例如，基于行号）。

有人可以告诉我在代码中需要修改的内容吗？

由于

Answer 1

您正在将两个数据集加载到哈希对象中 - 在声明它时是小数据集，然后在第一个do循环中加载大数据集。这对我没有意义，除非你已经为大数据集中的某些行而不是所有行填充了查找值，并且你试图在行之间传递它们。

然后循环查找数据集并为该数据集的每一行生成1个输出行。

目前还不清楚你在这里想要做什么，因为这不是哈希对象的标准用例。

这是我最好的猜测 - 如果这不是你想要做的事情，请发布示例输入和预期的输出数据集。

data want;
 set _1080544_27_08_2016;
 if 0 then set _2015_2016_playerlistlookup;
 if _n_ = 1 then do;
   declare Hash Plan(dataset: 'work._2015_2016_playerlistlookup');                             
   rc = plan.DefineKey ('Player_ID'); 
   rc = plan.DefineData ('Player_Full_Name', 'Player_First_Name', 'Player_Last_Name', 'Player_ID2');                                                                 
   rc = plan.DefineDone ();
 end;
 call missing(Player_Full_Name, Player_First_Name, Player_Last_Name);   
 rc = plan.find();
 drop rc;
run;

散列加入不按要求行事

1 个答案: