Question

我有两个表，需要再创建一个与其他两个表一起使用的表：

  first_table:                      SECOND TABLE
id     term                      id           term      majr_code
3      2014                       3           2010     ACT
3      2015                       3           2010     ACT
4      2014                       3           2011     GNST
4      2015                       3           2015     BUSA
5      2013                       3           2015     BUSA
5      2014                       4           2009     TIM  
6      2013                       4           2010     BAL
6      2014                       4           2014     TAR
                                  5           2011     SAR
                                  5           2013    COR
                                  6           2010     PAT
                                  6           2013     TOR

这是我的两张桌子。我需要创建另一个与第一个表相同的表，并添加一个列majr_code。

    first_table:                      
id     term      majr_code               
3      2014       GNST              
3      2015       BUSA                
4      2014       TAR              
4      2015       TAR                
5      2013       COR
5      2014       COR          
6      2013       TOR              
6      2014       TOR

我需要做的是，对于相同的id，如果第二个表与第一个表具有相同的术语，我将保持相同的majr_code。例如：对于第一张表有2014年和第二张表有2011年和2015年，我需要使用2011年的majr_Code 2014年期限。例如：第一个表格具有相同ID的2013年和2014年条款，如果第二个表格的最高期限是2013年，我将保留2013年和2014年的相同majr_Code

我知道它很复杂，如果检查表格和结果应该更清楚。如果仍然复杂，我可以删除这个问题。这是我可以解释的方式。谢谢！

Answer 1

我认为以下代码应该可以解决问题。它的工作原理如下：

1）读入样本数据集。

2）创建一个标题为second_table_nogaps的表，它只是second_table但到2015年没有年度差距。基本上，对于第二个表中的每个ID，它会检查给定的年度记录是否存在。如果是，则输出记录，如果不是，则使用上一年的majr_code创建新记录。如果给定id的最后一条记录不是2015年，则会在2015年之前生成新记录。（例如，为id = 4，year = 2014，majr_code = TAR创建新记录）

3）将id + term + majr_code的唯一值合并到first_table。结果表First_table_2应该是您正在寻找的！但是，要小心，如果同一个id + term有多个majr_codes，这一步将导致重复。

希望这有帮助！可能会简化步骤2中的代码，因为我对第一个和最后一个记录的处理不是特别有效。

  data first_table;
        infile datalines ;
        input id term;
        datalines ;
        3      2014 
        3      2015 
        4      2014 
        4      2015 
        5      2013 
        5      2014 
        6      2013 
        6      2014
        ;
    run;


data second_table;
    infile datalines ;
    input id term majr_code $;
    datalines ;
        3   2010    ACT
        3   2010    ACT
        3   2011    GNST
        3   2015    BUSA
        3   2015    BUSA
        4   2009    TIM
        4   2010    BAL
        4   2014    TAR
        5   2011    SAR
        5   2013    COR
        6   2010    PAT
        6   2013    TOR
    ;
run;

proc sort data=second_table ; by id term; run;

data second_table_nogaps (keep=id_nogaps term_nogaps majr_code_nogaps );
    set second_table end=eof;
    retain id_nogaps term_nogaps majr_code_nogaps ;

    *first set up the first row... establishes retained variables and outputs;
    if _N_ = 1 then do;
                id_nogaps = id ; 
                term_nogaps = term;
                majr_code_nogaps = majr_code;
                output;
        end;

        *for all but the first and last row;
        else if not eof then do;
            do while (  (term_nogaps + 1 < term ) /*this is to fill in gaps between years. (e.g. major code in 2011 and major code in 2014 within the same id*/
                        or 
                        ((id_nogaps ne id) and term_nogaps < 2015) /*this is to fill major code for all terms up through 2015 (e.g. last major code for id 4 is in 2014)*/
                      );
                term_nogaps = term_nogaps + 1;
                output;
            end;

            id_nogaps=id;
            term_nogaps = term;
            majr_code_nogaps=majr_code;
            output;
        end;

        else do;
            do while (term_nogaps + 1 < term );
                term_nogaps = term_nogaps + 1;
                output;
            end;
            id_nogaps=id;
            term_nogaps = term;
            majr_code_nogaps=majr_code;
            output;
            do while ( term_nogaps < 2015 );
                term_nogaps = term_nogaps + 1;
                output;
            end;
    end;
run;

proc sql;
    create table First_table_2 as 
    Select a.* , b.majr_code_nogaps as majr_code
    from first_table a
        left join 
            (select distinct id_nogaps, term_nogaps, majr_code_nogaps from second_table_nogaps) b /*select distinct values to prevent duplication*/
    on a.id   =   b.id_nogaps  and a.term = b.term_nogaps;
quit;

Answer 2

有一些方法，但sql可能是最简单的。你没有提供代码，所以我只是包含一个指针。在将表格分组为having后，您需要使用having term=max(term)过滤表格。

sas编码：选择最大变量

2 个答案: