如何匹配不在组中的数据(sql或SAS)

时间:2017-05-21 00:08:38

标签: sql sas

主表X1中有两列。 A和B. A是日期,B是人名。我想找到之前没有历史记录的人。 例如,Tom和Liliy在5/2有记录,除了他们之外我想在5/2之前找人。 我尝试在以前的数据中加入表格,但不知道如何在历史记录中避免使用相同的名称。

Table X1             
A   |   B               
5/1     Tom             
5/1     Liliy           
5/1     Jack            
5/2     Tom
5/2     Liliy

这是我的预期结果:     表X2
    A |乙
    5/2杰克

3 个答案:

答案 0 :(得分:0)

如果您正在寻找除这两个人以外的所有人并假设SELECT * FROM X1 WHERE A < Convert(datetime, '5/2') AND B NOT IN ('Tom', 'Liliy'); 实际上是日期列,那么您可以执行以下SQL查询。

with codecs.open("hamletQuote.txt",'r',encoding='utf8') as f:
    text = f.read()

with codecs.open("hamletQuote2.txt",'w',encoding='utf8') as f:
    f.write(text)

答案 1 :(得分:0)

我相信这就是你要找的东西。您将获得仅在列中出现一次的人的日期和名称。如果有多个同名实例,则SELECT A, B FROM X1 GROUP BY B HAVING COUNT(B) = 1 将大于一。

output = pd.merge(persons, jobs, how='left', left_on='job', right_on='name')[['pid', 'id']]

答案 2 :(得分:0)

根据您的问题,我只能想到与@Jonathan Porter

类似的答案

下面是要使用更大的起始数据集:

/*Generate a more diverse starting dataset*/
data X1;
    format A mmddyy9.;
    array names [10] $ 20 ('Phillip','Wendi','Virgilio','Scotty','Melony','Rayford','Vince','Cody','Julissa','Marquerite');
    do i=1 to 30;
        A = intnx('DAY', '01MAY2017'd, ceil(ranuni(3)*10));
        B = names[ceil(ranuni(3)*10)];
        output;
    end;
    drop names: i;
run;
proc sort data=X1 nodups;
    by A B;
run;

启动数据集X1:

A           B
----------------------
5/2/2017    Rayford
5/2/2017    Virgilio
5/3/2017    Cody
5/3/2017    Melony
5/3/2017    Rayford
5/3/2017    Wendi
5/4/2017    Scotty
5/5/2017    Rayford
5/5/2017    Wendi
5/6/2017    Rayford
5/6/2017    Vince
5/7/2017    Marquerite
5/7/2017    Virgilio
5/8/2017    Julissa
5/8/2017    Rayford
5/9/2017    Cody
5/9/2017    Julissa
5/9/2017    Marquerite
5/9/2017    Virgilio
5/9/2017    Wendi
5/10/2017   Julissa
5/10/2017   Phillip
5/11/2017   Cody
5/11/2017   Vince

按名称分组,检索计数和最新关联日期:

proc sql;
    create table X2 as
    select max(A) as latest_A format=mmddyy9., /*Latest date to which a name (B) is associated*/
           B,                                  
           count(*) as count_B                 /*Count of records associated with a given name (B)*/
    from X1
    group by B;
quit;
proc sort data=X2;
    by latest_A B;
run;

正如乔纳森所说,我假设你对以下X2中的星号记录感兴趣:

latest_A    B           count_B
------------------------------------
5/3/2017    Melony      1       ***
5/4/2017    Scotty      1       ***
5/8/2017    Rayford     5
5/9/2017    Marquerite  2
5/9/2017    Virgilio    3
5/9/2017    Wendi       3
5/10/2017   Julissa     3
5/10/2017   Phillip     1       ***
5/11/2017   Cody        3
5/11/2017   Vince       2