档案A:
C1(name) C2(other name)
Apple Fruit_1 Fruit_2
Orange Fruit_1 Fruit_2
Carrot Vegetable_1 Vegetable_2
Potato Vegetable_1 Vegetable_2
档案B:
C1(name) C2 (last used)
Apple 2014
Fruit_1 2011
Carrot 2010
Vegetable_2 2018
期望的结果:
C1(name) C2(other name) C3(last used)
Apple Fruit_1 Fruit_2 2014
Orange Fruit_1 Fruit_2 2011
Carrot Vegetable_1 Vegetable_2 2018
Potato Vegetable_1 Vegetable_2 2018
基本上我想根据第一个文件中的两列来合并我的文件。如果在文件中的列C1或C2中找到文件b中的“名称”,则添加日期。请注意,对于胡萝卜,日期与马铃薯相同,这是因为Vegetable_2具有最近的日期,而Vegetable_2可以指胡萝卜或土豆。
在我使用MERGE的测试中,我无法合并以检查第二列,因此我只获取两个文件中C1中存在的项目的数据。
所以我目前的结果是:
C1(name) C2(other name) C3(last used)
Apple Fruit_1 Fruit_2 2014
Orange Fruit_1 Fruit_2
Carrot Vegetable_1 Vegetable_2 2010
Potato Vegetable_1 Vegetable_2
有关SAS流程可用于获得理想结果的任何想法吗?仅供参考我在大型机上使用SAS。我不确定这是否会改变,因为我没有在大型机环境之外使用SAS。
答案 0 :(得分:1)
您只需要Left Join
文件B到A,并在Contains
子句中使用On
运算符。
如果table1.c1 = table2.c1或table2.c1是table1.C2(文件A)的子集,那么你就离开了联接
虚拟数据:
data file_a;
length c1 $ 8 c2 $ 30 ;
input c1 $ c2 $ ;
datalines;
Apple Fruit_1,Fruit_2
Orange Fruit_1,Fruit_2
Carrot Vegetable_1,Vegetable_2
Potato Vegetable_1,Vegetable_2
;
data file_b;
length c1 $ 12 c2 $ 4 ;
input c1 $ c2 $ ;
datalines;
Apple 2014
Fruit_1 2011
Carrot 2010
Vegetable_2 2018
;
代码:
proc sql;
create table want as
select a.*, b.c2 as last_used , b.c1 as cc
from file_a as a left join file_b as b
on a.c1= b.c1 or a.c2 contains b.c1
;
quit;
输出:
c1=Apple c2=Fruit_1,Fruit_2 last_used=2014 cc=Apple
c1=Carrot c2=Vegetable_1,Vegetable_2 last_used=2010 cc=Carrot
c1=Carrot c2=Vegetable_1,Vegetable_2 last_used=2018 cc=Vegetable_2
c1=Potato c2=Vegetable_1,Vegetable_2 last_used=2018 cc=Vegetable_2
c1=Orange c2=Fruit_1,Fruit_2 last_used= cc=
答案 1 :(得分:1)
仅合并方法需要
示例数据
data foods;
length key1 $20 key2s $50;
input key1 key2s &; datalines;
Apple Fruit_1 Fruit_2
Orange Fruit_1 Fruit_2
Carrot Vegetable_1 Vegetable_2
Potato Vegetable_1 Vegetable_2
Knuckle Sandwich_1 Sandwich_2
run;
data dates;
length key $20 year 8;
input key year; datalines;
Apple 2014
Fruit_1 2011
Carrot 2010
Vegetable_2 2018
Grain_1 2009
run;
逐行旋转每一行,以获得每个键值一行
data food_single_keyed;
length key $20;
set foods;
rowid = _n_;
key = key1; output;
do i = 1 by 1;
key = scan(key2s,i);
if missing(key) then leave;
output;
if i > 10 then stop;
end;
drop i;
run;
按键排序以准备按键合并
proc sort data=food_single_keyed;
by key;
run;
proc sort data=dates;
by key;
run;
按键合并
data foods_dated;
merge food_single_keyed dates;
by key;
run;
排序以准备最终选择
proc sort data=foods_dated;
by rowid descending year ;
run;
仅选择一行的第一个最近关联年份。您还可以保留key
列,以了解用于选择年份的值。
data want (keep=key1 key2s year);
set foods_dated;
by rowid;
if rowid;
if first.rowid;
run;