Question

我有一个包含3列的数据集：Name，System，UserID。我想计算一个人在报告中出现的次数，但如果他们是同名的不同人，则不计算他们。区别在于UserID字段，并且仅在单个系统中。如果单个名称具有多个具有相同系统和不同用户ID的行，则将标记具有该名称的所有观察以供审阅。对于这个数据集，我希望看到下面的输出。

Name       System   UserID
John Doe   Sys1     [blank]
John Doe   Sys1     AB1234
John Doe   Sys2     AB2345
Jane Doe   Sys1     AA2345
Jane Doe   Sys1     AA23456
Jane Doe   Sys2     AA2345
Joe Smith  Sys1     JS963
Joe Smith  Sys2     JS741


Name       Count  System                      Follow-up
John Doe   1      Sys1 -                      Yes
John Doe   1      Sys1 - AB1234               Yes
John Doe   1      Sys2 - AB2345               Yes
Jane Doe   1      Sys1 - AA2345               Yes
Jane Doe   1      Sys1 - AA23456              Yes
Jane Doe   1      Sys2 - AA2345               Yes
Joe Smith  2      Sys1 - JS963, Sys2 - JS741  No

非常感谢任何帮助！

我的代码如下。它目前只是对名称进行计数，并且不知道如何添加条件。

PROC SQL;

     CREATE TABLE Sorted_Master_Original AS

     SELECT Name,
            COUNT(Name) AS Total,
            System,
            UserID,
            CATX(' - ',System,UserID) AS SystemID

     FROM Master_Original

     WHERE Name <> ""

     GROUP BY Name;

QUIT;

DATA TESTDATA.Final_Listing;

LENGTH SystemsAccessed $200.;

   DO UNTIL (last.Name);

   SET Sorted_Master_Original;

   BY Name NOTSORTED;

   SystemsAccessed=CATX(', ',SystemsAccessed,SystemID);

END;

DROP System SystemID;

RUN;

Answer 1

确定组上的信号，然后应用于组中的每个成员的情况可以依次使用两个DOW循环来完成。第一个是你用last.循环测试和循环中的set和by进行编码，第二个是通过类似大小的循环重复组在单独的SET缓冲区中 - 此时可以应用信号。

数据

data have;
length Name System UserID $20;
input Name & System & UserID; datalines;
John Doe   Sys1     .
John Doe   Sys1     AB1234
John Doe   Sys2     AB2345
Jane Doe   Sys1     AA2345
Jane Doe   Sys1     AA23456
Jane Doe   Sys2     AA2345
Joe Smith  Sys1     JS963
Joe Smith  Sys2     JS741
Bob Smith  Sys3     MS13
run;

按组处理订购

proc sort data=have;
  by Name System UserId;
run;

使用顺序DOW循环的DATA步骤

data want(keep=name count system_userid_list followup);
  * loop over name group;
  do _n_ = 1 by 1 until (last.name);
    set have;
    by name system userid;

    * tests of conditions within the group determine some signal;

    * check if there is more than one userid within a system within the name group;
    if not (first.system and last.system) then
      count = 1;
  end;

  if not count then count = _n_;

  length system_userid_list $200;
  followup = ifc(count=1 and _n_>1 ,'Yes','No');

  * followup 'signal' will be applied/available to each row of the group;

  * either output single row as followup, or concat to an aggregate list;
  * mixed output of singles and aggregates, good idea?;

  * reiterate over group in second SET buffer;
  do _n_ = 1 to _n_;
    set have;
    item = catx(' - ',system,userid);
    if count > 1 then
      system_userid_list = catx(',',system_userid_list,item);
    else do;
      system_userid_list = item;
      output;
    end;
  end;

  if count > 1 then output;
run;

SAS计算并总结变量条件是否为真

1 个答案: