Question

档案A：

  C1(name)       C2(other name)
Apple      Fruit_1 Fruit_2
Orange     Fruit_1 Fruit_2
Carrot     Vegetable_1 Vegetable_2
Potato     Vegetable_1 Vegetable_2

档案B：

 C1(name)    C2 (last used)
Apple        2014  
Fruit_1      2011 
Carrot       2010
Vegetable_2  2018

期望的结果：

  C1(name)       C2(other name)       C3(last used)
Apple      Fruit_1 Fruit_2           2014 
Orange     Fruit_1 Fruit_2           2011
Carrot     Vegetable_1 Vegetable_2   2018
Potato     Vegetable_1 Vegetable_2   2018

基本上我想根据第一个文件中的两列来合并我的文件。如果在文件中的列C1或C2中找到文件b中的“名称”，则添加日期。请注意，对于胡萝卜，日期与马铃薯相同，这是因为Vegetable_2具有最近的日期，而Vegetable_2可以指胡萝卜或土豆。

在我使用MERGE的测试中，我无法合并以检查第二列，因此我只获取两个文件中C1中存在的项目的数据。

所以我目前的结果是：

  C1(name)       C2(other name)       C3(last used)
Apple      Fruit_1 Fruit_2           2014 
Orange     Fruit_1 Fruit_2           
Carrot     Vegetable_1 Vegetable_2   2010
Potato     Vegetable_1 Vegetable_2

有关SAS流程可用于获得理想结果的任何想法吗？仅供参考我在大型机上使用SAS。我不确定这是否会改变，因为我没有在大型机环境之外使用SAS。

Answer 1

您只需要Left Join文件B到A，并在Contains子句中使用On运算符。

如果table1.c1 = table2.c1或table2.c1是table1.C2（文件A）的子集，那么你就离开了联接

虚拟数据：

data file_a;
length c1 $ 8 c2 $ 30 ;
input c1 $ c2 $ ;
datalines;
Apple      Fruit_1,Fruit_2
Orange     Fruit_1,Fruit_2
Carrot     Vegetable_1,Vegetable_2
Potato     Vegetable_1,Vegetable_2
;
data file_b;
length c1 $ 12 c2 $ 4 ;
input c1 $ c2 $ ;
datalines;
Apple        2014  
Fruit_1      2011 
Carrot       2010
Vegetable_2  2018
;

代码：

proc sql;
create table want as 
select a.*, b.c2 as last_used  , b.c1 as cc
from file_a as a left join file_b as b
on a.c1= b.c1 or a.c2 contains b.c1
;
quit;

输出：

c1=Apple c2=Fruit_1,Fruit_2 last_used=2014 cc=Apple 
c1=Carrot c2=Vegetable_1,Vegetable_2 last_used=2010 cc=Carrot 
c1=Carrot c2=Vegetable_1,Vegetable_2 last_used=2018 cc=Vegetable_2 
c1=Potato c2=Vegetable_1,Vegetable_2 last_used=2018 cc=Vegetable_2 
c1=Orange c2=Fruit_1,Fruit_2 last_used=  cc=

Answer 2

仅合并方法需要

逐行旋转第一个表
通过任何键进行排序以准备合并
合并
排序以恢复原始行顺序和降序年度
选择最近一年

示例数据

data foods;
length key1 $20 key2s $50;
input key1 key2s &; datalines;
Apple      Fruit_1 Fruit_2
Orange     Fruit_1 Fruit_2
Carrot     Vegetable_1 Vegetable_2
Potato     Vegetable_1 Vegetable_2
Knuckle    Sandwich_1 Sandwich_2
run;

data dates;
length key $20 year 8;
input key year; datalines;
Apple        2014  
Fruit_1      2011 
Carrot       2010
Vegetable_2  2018
Grain_1      2009
run;

逐行旋转每一行，以获得每个键值一行

data food_single_keyed;
  length key $20;
  set foods;

  rowid = _n_;

  key = key1; output;
  do i = 1 by 1;
    key = scan(key2s,i); 
    if missing(key) then leave;
    output;
    if i > 10 then stop;
  end;
  drop i;
run;

按键排序以准备按键合并

proc sort data=food_single_keyed;
  by key;
run;

proc sort data=dates;
  by key;
run;

按键合并

data foods_dated;
  merge food_single_keyed dates;
  by key;
run;

排序以准备最终选择

proc sort data=foods_dated;
  by rowid descending year ;
run;

仅选择一行的第一个最近关联年份。您还可以保留key列，以了解用于选择年份的值。

data want (keep=key1 key2s year);
  set foods_dated;
  by rowid;
  if rowid;
  if first.rowid;
run;

在文件A中组合两个基于两列的SAS文件

2 个答案: