在文件A中组合两个基于两列的SAS文件

时间:2018-04-04 15:38:23

标签: merge sas mainframe

档案A:

  C1(name)       C2(other name)
Apple      Fruit_1 Fruit_2
Orange     Fruit_1 Fruit_2
Carrot     Vegetable_1 Vegetable_2
Potato     Vegetable_1 Vegetable_2

档案B:

 C1(name)    C2 (last used)
Apple        2014  
Fruit_1      2011 
Carrot       2010
Vegetable_2  2018

期望的结果:

  C1(name)       C2(other name)       C3(last used)
Apple      Fruit_1 Fruit_2           2014 
Orange     Fruit_1 Fruit_2           2011
Carrot     Vegetable_1 Vegetable_2   2018
Potato     Vegetable_1 Vegetable_2   2018

基本上我想根据第一个文件中的两列来合并我的文件。如果在文件中的列C1或C2中找到文件b中的“名称”,则添加日期。请注意,对于胡萝卜,日期与马铃薯相同,这是因为Vegetable_2具有最近的日期,而Vegetable_2可以指胡萝卜或土豆。

在我使用MERGE的测试中,我无法合并以检查第二列,因此我只获取两个文件中C1中存在的项目的数据。

所以我目前的结果是:

  C1(name)       C2(other name)       C3(last used)
Apple      Fruit_1 Fruit_2           2014 
Orange     Fruit_1 Fruit_2           
Carrot     Vegetable_1 Vegetable_2   2010
Potato     Vegetable_1 Vegetable_2   

有关SAS流程可用于获得理想结果的任何想法吗?仅供参考我在大型机上使用SAS。我不确定这是否会改变,因为我没有在大型机环境之外使用SAS。

2 个答案:

答案 0 :(得分:1)

您只需要Left Join文件B到A,并在Contains子句中使用On运算符。

如果table1.c1 = table2.c1或table2.c1是table1.C2(文件A)的子集,那么你就离开了联接

虚拟数据:

data file_a;
length c1 $ 8 c2 $ 30 ;
input c1 $ c2 $ ;
datalines;
Apple      Fruit_1,Fruit_2
Orange     Fruit_1,Fruit_2
Carrot     Vegetable_1,Vegetable_2
Potato     Vegetable_1,Vegetable_2
;
data file_b;
length c1 $ 12 c2 $ 4 ;
input c1 $ c2 $ ;
datalines;
Apple        2014  
Fruit_1      2011 
Carrot       2010
Vegetable_2  2018
;

代码:

proc sql;
create table want as 
select a.*, b.c2 as last_used  , b.c1 as cc
from file_a as a left join file_b as b
on a.c1= b.c1 or a.c2 contains b.c1
;
quit;

输出:

c1=Apple c2=Fruit_1,Fruit_2 last_used=2014 cc=Apple 
c1=Carrot c2=Vegetable_1,Vegetable_2 last_used=2010 cc=Carrot 
c1=Carrot c2=Vegetable_1,Vegetable_2 last_used=2018 cc=Vegetable_2 
c1=Potato c2=Vegetable_1,Vegetable_2 last_used=2018 cc=Vegetable_2 
c1=Orange c2=Fruit_1,Fruit_2 last_used=  cc=  

答案 1 :(得分:1)

仅合并方法需要

  • 逐行旋转第一个表
  • 通过任何键进行排序以准备合并
  • 合并
  • 排序以恢复原始行顺序和降序年度
  • 选择最近一年

示例数据

data foods;
length key1 $20 key2s $50;
input key1 key2s &; datalines;
Apple      Fruit_1 Fruit_2
Orange     Fruit_1 Fruit_2
Carrot     Vegetable_1 Vegetable_2
Potato     Vegetable_1 Vegetable_2
Knuckle    Sandwich_1 Sandwich_2
run;

data dates;
length key $20 year 8;
input key year; datalines;
Apple        2014  
Fruit_1      2011 
Carrot       2010
Vegetable_2  2018
Grain_1      2009
run;

逐行旋转每一行,以获得每个键值一行

data food_single_keyed;
  length key $20;
  set foods;

  rowid = _n_;

  key = key1; output;
  do i = 1 by 1;
    key = scan(key2s,i); 
    if missing(key) then leave;
    output;
    if i > 10 then stop;
  end;
  drop i;
run;

按键排序以准备按键合并

proc sort data=food_single_keyed;
  by key;
run;

proc sort data=dates;
  by key;
run;

按键合并

data foods_dated;
  merge food_single_keyed dates;
  by key;
run;

排序以准备最终选择

proc sort data=foods_dated;
  by rowid descending year ;
run;

仅选择一行的第一个最近关联年份。您还可以保留key列,以了解用于选择年份的值。

data want (keep=key1 key2s year);
  set foods_dated;
  by rowid;
  if rowid;
  if first.rowid;
run;