Question

我正在努力加入两个表而不使用proc sql创建重复的行（不确定是否有更高效的其他方法）。

内部联接开启：datepart（table1.date）= datepart（table2.date）AND tag = tag AND ID = ID

我认为问题是表1中的日期和不同的名称。通过查看表明确表1的第1行应该与表2的第1行连接，因为事务从00开始：表1中的04，表2中的00:06结束。我发出的问题是我无法使用时间戳加入日期，因此我删除了时间戳，因为它创建了重复项。

Table1:

id tag    date            amount   name_x
1 23      01JUL2018:00:04  12          smith ltd
1 23      01JUL2018:00:09  12          anna smith



table 2:



id tag  ref   amount   date
1 23   19   12          01JUL2018:00:06:00
1 23   20   12          01JUL2018:00:10:00



Desired output:

id tag    date            amount   name_x       ref
1 23      01JUL2018  12          smith ltd       19
1 23      01JUL2018  12          anna smith      20

感谢您的帮助。谢谢！

Answer 1

您需要为该日期时间连接设置边界。你弄错的原因是正确的。我猜想下限是前一个日期时间，如果它存在且上限是该记录的日期时间。

顺便说一句，这是某人的数据库设计很糟糕......

我们首先按id，tag和date

排序table2

proc sort data=table2 out=temp;
by id tag date;
run;

现在编写一个数据步骤，为唯一id/tag组合添加上一个日期。

data temp;
set temp;
format low_date datetime20.
by id tag;
retain p_date;

if first.tag then
   p_date = 0;

low_date = p_date;
p_date = date;
run;

现在更新您的联接以使用日期范围。

proc sql noprint;
create table want as
select a.id, a.tag, a.date, a.amount, a.name_x, b.ref
from table1 as a
  inner join
     temp as b
  on a.id = b.id
  and a.tag = b.tag
  and b.low_date < a.date <= b.date;
quit;

Answer 2

如果我的理解是正确的，你想通过ID，标签和最接近的两个日期合并，这意味着table1中的01JUL2018：00：04最接近01JUL2018：00：06：00在talbe2和01JUL2018： 00:09是01JUL2018：00：10：00，你可以试试这个：

data table1;
input id tag date:datetime21.   amount   name_x $15.;
format date datetime21.;
cards;
1 23 01JUL2018:00:04 12 smith ltd
1 23 01JUL2018:00:09 12 anna smith
;

data table2;
input id tag  ref   amount   date: datetime21.;
format date datetime21.;
cards;
1 23 19 12 01JUL2018:00:06:00
1 23 20 12 01JUL2018:00:10:00
;


proc sql;
   select a.*,b.ref from table1 a inner join table2 b
   on a.id=b.id and a.tag=b.tag
   group by a.id,a.tag,a.date
   having abs(a.date-b.date)=min(abs(a.date-b.date));
quit;

SAS proc sql内连接没有重复

2 个答案: