我有一个包含3列的数据集:Name,System,UserID。我想计算一个人在报告中出现的次数,但如果他们是同名的不同人,则不计算他们。区别在于UserID字段,并且仅在单个系统中。如果单个名称具有多个具有相同系统和不同用户ID的行,则将标记具有该名称的所有观察以供审阅。对于这个数据集,我希望看到下面的输出。
Name System UserID
John Doe Sys1 [blank]
John Doe Sys1 AB1234
John Doe Sys2 AB2345
Jane Doe Sys1 AA2345
Jane Doe Sys1 AA23456
Jane Doe Sys2 AA2345
Joe Smith Sys1 JS963
Joe Smith Sys2 JS741
Name Count System Follow-up
John Doe 1 Sys1 - Yes
John Doe 1 Sys1 - AB1234 Yes
John Doe 1 Sys2 - AB2345 Yes
Jane Doe 1 Sys1 - AA2345 Yes
Jane Doe 1 Sys1 - AA23456 Yes
Jane Doe 1 Sys2 - AA2345 Yes
Joe Smith 2 Sys1 - JS963, Sys2 - JS741 No
非常感谢任何帮助!
我的代码如下。它目前只是对名称进行计数,并且不知道如何添加条件。
PROC SQL;
CREATE TABLE Sorted_Master_Original AS
SELECT Name,
COUNT(Name) AS Total,
System,
UserID,
CATX(' - ',System,UserID) AS SystemID
FROM Master_Original
WHERE Name <> ""
GROUP BY Name;
QUIT;
DATA TESTDATA.Final_Listing;
LENGTH SystemsAccessed $200.;
DO UNTIL (last.Name);
SET Sorted_Master_Original;
BY Name NOTSORTED;
SystemsAccessed=CATX(', ',SystemsAccessed,SystemID);
END;
DROP System SystemID;
RUN;
答案 0 :(得分:0)
确定组上的信号,然后应用于组中的每个成员的情况可以依次使用两个DOW循环来完成。第一个是你用last.
循环测试和循环中的set
和by
进行编码,第二个是通过类似大小的循环重复组在单独的SET缓冲区中 - 此时可以应用信号。
数据
data have;
length Name System UserID $20;
input Name & System & UserID; datalines;
John Doe Sys1 .
John Doe Sys1 AB1234
John Doe Sys2 AB2345
Jane Doe Sys1 AA2345
Jane Doe Sys1 AA23456
Jane Doe Sys2 AA2345
Joe Smith Sys1 JS963
Joe Smith Sys2 JS741
Bob Smith Sys3 MS13
run;
按组处理订购
proc sort data=have;
by Name System UserId;
run;
使用顺序DOW循环的DATA步骤
data want(keep=name count system_userid_list followup);
* loop over name group;
do _n_ = 1 by 1 until (last.name);
set have;
by name system userid;
* tests of conditions within the group determine some signal;
* check if there is more than one userid within a system within the name group;
if not (first.system and last.system) then
count = 1;
end;
if not count then count = _n_;
length system_userid_list $200;
followup = ifc(count=1 and _n_>1 ,'Yes','No');
* followup 'signal' will be applied/available to each row of the group;
* either output single row as followup, or concat to an aggregate list;
* mixed output of singles and aggregates, good idea?;
* reiterate over group in second SET buffer;
do _n_ = 1 to _n_;
set have;
item = catx(' - ',system,userid);
if count > 1 then
system_userid_list = catx(',',system_userid_list,item);
else do;
system_userid_list = item;
output;
end;
end;
if count > 1 then output;
run;