我有两张桌子需要加入。这些表只共享1个共同字段(ID,并且它不是唯一的)。是否可以连接这两个表但是使其唯一并将所有匹配的数据保持在一行?
例如,我有两个表如下:
+-------+----------+
| ID | NAME |
+-------+----------+
| A | Jack |
| A | Andy |
| A | Steve |
| A | Jay |
| B | Chris |
| B | Vicky |
| B | Emma |
+-------+----------+
另一个仅与ID列相关的表:
+-------+--------+
| ID | Age |
+-------+--------+
| A | 22 |
| A | 31 |
| A | 11 |
| B | 40 |
| B | 17 |
| B | 20 |
| B | 3 |
| B | 65 |
+-------+--------+
我想得到的最终结果是:
+-------+----------+++-------+
| ID | NAME | Age |
+-------+----------++-------+-
| A | Jack | 22 |
| A | Andy | 31 |
| A | Steve | 11 |
| A | Jay | null |
| B | Chris | 40 |
| B | Vicky | 17 |
| B | Emma | 20 |
| B | null | 3 |
| B | null | 65 |
+-------+----------+++-------+
答案 0 :(得分:2)
这是数据步骤合并的默认行为,除了它不会将最后一行的变量设置为缺失 - 但它很容易捏造。
还有其他方法可以做到这一点,我认为最好的是散列对象,如果你对此感到满意的话。
data names;
infile datalines dlm='|';
input ID $ NAME $;
datalines;
| A | Jack |
| A | Andy |
| A | Steve |
| A | Jay |
| B | Chris |
| B | Vicky |
| B | Emma |
;;;;
run;
data ages;
infile datalines dlm='|';
input id $ age;
datalines;
| A | 22 |
| A | 31 |
| A | 11 |
| B | 40 |
| B | 17 |
| B | 20 |
| B | 3 |
| B | 65 |
;;;;
run;
data want;
merge names(in=_a) ages(in=_b);
by id;
if _a;
if name ne lag(name) then output; *this assumes `name` is unique in id - if it is not we may have to do a bit more work here;
call missing(age); *clear age after output so we do not attempt to fill extra rows with the same age - age will be 'retain'ed;
run;