SAS - 在数据步骤中查找排序等级

时间:2016-08-02 18:05:54

标签: sas

我正在处理一些SAS数据,并试图找出如何在尽可能少的步骤中找到datastep中的记录排序位置。
这是一个例子 -

data Places;
   infile datalines delimiter=',';
   input state $ city $40. ;
   datalines;
WA,Seattle
OR,Portland
OR,Salem
OR,Tillamook
WA,Vancouver
;

Proc Sort data=WORK.PLACES;
    by STATE CITY;
run;

data WORK.PLACES;
    set WORK.PLACES;
    by STATE CITY;
    ST_CITY_RNK = _N_;
run;

Proc Sort data=WORK.PLACES;
    by CITY;
run;

data WORK.PLACES;
    set WORK.PLACES;
    by CITY;
    CITY_RNK = _N_;
run;

在这个例子中,有没有办法计算ST_CITY_RNK和CITY_RNK而不进行多次排序?感觉这应该可以通过有序的哈希表来实现,但我不确定如何去做。

谢谢!

2 个答案:

答案 0 :(得分:1)

哈希表是可行的。临时数组的效果大致相同,可能会更容易一些。

两者的主要限制是你如何处理非独特的城市名称?萨勒姆,俄勒冈州和马萨诸塞州塞勒姆?显然,在州 - 市级别中,这很好,但你可能会发现拥有一个以上林肯或类似国家的州,谁知道;但是在刚刚城市,你肯定会找到几个Columbias,Lincolns,Charlestons等。我的解决方案给所有人提供了相同的排序等级(但是然后会向前跳6或者向下一个跳过)。您在上面发布的数据步骤解决方案将给予他们独特的排名。哈希迭代器可能会做任何一个。你可以通过一些努力来调整这一点,以给出独特的等级,但它会起作用。

data Places;
   infile datalines delimiter=',';
   input state $ city $40. ;
   datalines;
WA,Seattle
OR,Portland
OR,Salem
OR,Tillamook
WA,Vancouver
;
run;

data sortrank;

    *Init pair of arrays - the one that stores the original values, and one to mangle by sorting;
  array states[32767] $ _temporary_;
  array states_cities_sorted[32767] $40. _temporary_ (32767*'ZZZZZ');
  array cities[32767] $40. _temporary_;
  array cities_sorted[32767] $40. _temporary_ (32767*'ZZZZZ');


    *Iterate over the dataset, load into arrays;
  do _n_ = 1 by 1 until (Eof);
    set places end=eof;
    states[_n_] = state;;
    states_cities_sorted[_n_] = catx(',',state,city);
    cities[_n_] = city;
    cities_sorted[_n_] = city;
  end;

    *Sort the to-be-sorted arrays;
  call sortc(of states_cities_sorted[*]);
  call sortc(of cities_sorted[*]);


  do _i = 1 to _n_;
        *For each array element, look up the rank using `whichc`, looking for the value of the unsorted element in the sorted list;
    city_rank = whichc(cities[_i],of cities_sorted[*]);
    state_cities_rank = whichc(catx(',',states[_i],cities[_i]),of states_cities_sorted[*]);
        *And put the array elements back in their proper variables;
    city = cities[_i];
    state= states[_i];
        *And finally make a row output;
    output;
  end;

run;

答案 1 :(得分:0)

供参考,这是一种哈希方法:

data Places;
   infile datalines delimiter=',';
   input state $ city $40. ;
   datalines;
WA,Seattle
OR,Portland
OR,Salem
OR,Tillamook
WA,Vancouver
;
run;

data places;
    set places;
    if _n_ = 1 then do;
        declare hash h1(ordered:'a',dataset:'places');
        rc = h1.definekey('city');
        rc = h1.definedata('city');
        rc = h1.definedone();
        declare hiter hi1('h1');
        declare hash h2(ordered:'a',dataset:'places');
        rc = h2.definekey('state','city');
        rc = h2.definedata('state','city');
        rc = h2.definedone();
        declare hiter hi2('h2');
    end;
    t_city = city;
    t_state = state;
    rc = hi1.first();
    do city_rank = 1 by 1 until(t_city = city);
        rc = hi1.next();
    end;
    rc = hi2.first();
    do state_city_rank = 1 by 1 until(t_city = city and t_state = state);
        rc = hi2.next();
    end;
    state = t_state;
    city = t_city;
    drop t_:;
run;