基于某些列信息在PROC SQL中进行重复数据删除

时间:2018-04-06 17:38:05

标签: sql sas

所以我有以下数据

ID | Status
________________
1  | In Progress
2  | In Progress
3  | Done
3  | In Progress
4  | Backlog
5  | Backlog
5  | In Progress
6  | Done
7  | Backlog
7  | In Progress
7  | Done

但是,根据“状态”列中的信息,当有多个ID时,我希望只有一个ID。因此,对应于ID 3,我们有状态DoneIn Progress。在这里,我想保留Done并弃掉In Progress。对于ID 7,我想保留Done并丢弃其他两种状态。

所以最终的结果是:

 ID | Status
________________
1  | In Progress
2  | In Progress
3  | Done
4  | Backlog
5  | In Progress
6  | Done
7  | Done

问题在于ID 5,例如当它不是Done时,却是In Progress

我试图用CASE WHEN语句来做,但是因为我给它一个重要的顺序它也保留了第二个选项。所以,如果我愿意:

SELECT CASE WHEN Status = 'Done' THEN 1
            WHEN Status = 'In Progress' THEN 1
            WHEN Status = 'Backlog' THEN 1
            ELSE 0
       END

但是我想只保留最重要的一个,所以它应该采用7 | Done然后忽略其他两种状态。但是,对于5,它需要In Progress

有什么想法吗?

3 个答案:

答案 0 :(得分:1)

您对case表达的看法很好。您希望将其与聚合相结合。

对于这个问题:

SELECT id,
       (CASE WHEN SUM(CASE WHEN Status = 'Done' THEN 1 ELSE 0 END) > 0 THEN 'Done'
             WHEN SUM(CASE WHEN 'In Progress' THEN 1 ELSE 0 END) > 0 THEN 'In Progress'
             WHEN SUM(CASE WHEN Status = 'Backlog' THEN 1 ELSE 0 END) > 0 THEN 'Backlog'
             ELSE 'Unknown'
        END) as status
FROM t
GROUP BY id

答案 1 :(得分:1)

另一种解决方案显然会对状态进行排名并选择最高的状态。

查询

proc sql; 
  create table want as
  select distinct id, status,
    case 
      when status = 'Done' then 3
      when status = 'In Progress' then 2
      when status = 'Backlog' then 1
      else 0
    end as rank
  from 
  have
  group by id
  having rank = max(rank);

如果您不希望计算出的排名值使用want(drop=rank)或嵌套查询,只能从中选择idstatus

数据

data have;
infile cards dlm='|';
input id status $20.; datalines;
1  | In Progress
2  | In Progress
3  | Done
3  | In Progress
4  | Backlog
5  | Backlog
5  | In Progress
6  | Done
7  | Backlog
7  | In Progress
7  | Done
run;

答案 2 :(得分:0)

以下是不使用PROC SQL的解决方案

 data have;
    input ID &status & $15.;
    cards;
    1   In Progress
    2   In Progress
    3   Done
    3   In Progress
    4   Backlog
    5   Backlog
    5   In Progress
    6   Done
    7   Backlog
    7   In Progress
    7   Done
    ;
    run;

/ *为每个状态添加一定程度的重要性* /

data have;
set have;
if status ="Backlog" then importance=1;
if status ="In Progress" then importance=2;
if status ="Done" then importance=3;
run;

proc sort data=have;
by ID importance;
run;

data want;
set have;
by ID;
if last.ID;
drop importance;
run;