SAS或Postgresql:根据另一列值

时间:2018-03-02 06:35:27

标签: sql postgresql sas

我想添加列“index”,当列“variable”中的行具有相同值时,列“index”的值相同。您可以使用Postgresql或SAS语法。

有一点是“变量”列中的值每天都在变化,例如tableA和tableB,所以硬代码是不可接受的。感谢任何建议!

+----------+--------------+------+-------+-----+-----+-------+
| variable | new_variable | type | start | end | woe | index |
+----------+--------------+------+-------+-----+-----+-------+
| A        | mi_A         | char | 1     |     | 1.3 |     1 |
| A        | mi_A         | char | 0     |     | 0.6 |     1 |
| B        | mi_B         | char | 1     |     | 5.4 |     2 |
| B        | mi_B         | char | 0     |     | 0.1 |     2 |
| gnd_cd   | gnd_cd       | char | 3     |     | 1.3 |     3 |
| gnd_cd   | gnd_cd       | char | @0    |     | 0.6 |     3 |
| gnd_cd   | gnd_cd       | char | 2     |     | 5.4 |     3 |
| gnd_cd   | gnd_cd       | char | N     |     | 0.1 |     3 |
| gnd_cd   | gnd_cd       | char | 1     |     | 1.3 |     3 |
| gnd_cd   | gnd_cd       | char | 99    |     | 0.6 |     3 |
| mar_sign | mar_sign     | char | 0     |     | 5.4 |     4 |
| mar_sign | mar_sign     | char | Y     |     | 0.1 |     4 |
| mar_sign | mar_sign     | char | N     |     |   6 |     4 |
| C        | C            | char | 6     |     |   2 |     5 |
| C        | C            | char | 7     |     | 2.1 |     5 |
| C        | C            | char | 8     |     | 2.2 |     5 |
+----------+--------------+------+-------+-----+-----+-------+
                         (tableA)

+--------------+--------------+------+-------+-----+-----+-------+
|   variable   | new_variable | type | start | end | woe | index |
+--------------+--------------+------+-------+-----+-----+-------+
| D            | mi_D         | char | 1     |     |   1 |     1 |
| D            | mi_D         | char | 0     |     |   2 |     1 |
| E            | mi_E         | char | 1     |     |   2 |     2 |
| E            | mi_E         | char | 0     |     |   3 |     2 |
| education_bg | education_bg | char | 3     |     |   1 |     3 |
| education_bg | education_bg | char | @0    |     |   5 |     3 |
| education_bg | education_bg | char | 2     |     |   6 |     3 |
| education_bg | education_bg | char | N     |     |   4 |     3 |
| education_bg | education_bg | char | 1     |     |   3 |     3 |
| education_bg | education_bg | char | 99    |     |   3 |     3 |
| sex          | sex          | char | 0     |     |   2 |     4 |
| sex          | sex          | char | Y     |     |   1 |     4 |
| sex          | sex          | char | N     |     |   0 |     4 |
| C            | C            | char | 6     |     |   6 |     5 |
| C            | C            | char | 7     |     |   4 |     5 |
| C            | C            | char | 8     |     |   1 |     5 |
+--------------+--------------+------+-------+-----+-----+-------+
                             (tableB)

3 个答案:

答案 0 :(得分:2)

您可以使用保留功能在单个datastep中在SAS中执行此操作,并按变量进行分组。

代码:

data have;
infile datalines dlm='|';
input variable $ new_variable $ type $ start $  end $ woe ;
datalines;
| A        | mi_A         | char | 1     |     | 1.3 
| A        | mi_A         | char | 0     |     | 0.6 
| B        | mi_B         | char | 1     |     | 5.4 
| B        | mi_B         | char | 0     |     | 0.1 
| gnd_cd   | gnd_cd       | char | 3     |     | 1.3 
| gnd_cd   | gnd_cd       | char | @0    |     | 0.6 
| gnd_cd   | gnd_cd       | char | 2     |     | 5.4 
| gnd_cd   | gnd_cd       | char | N     |     | 0.1 
| gnd_cd   | gnd_cd       | char | 1     |     | 1.3 
| gnd_cd   | gnd_cd       | char | 99    |     | 0.6 
| mar_sign | mar_sign     | char | 0     |     | 5.4 
| mar_sign | mar_sign     | char | Y     |     | 0.1 
| mar_sign | mar_sign     | char | N     |     |   6 
| C        | C            | char | 6     |     |   2 
| C        | C            | char | 7     |     | 2.1 
| C        | C            | char | 8     |     | 2.2  
run;

data want;
set have ;
by variable notsorted;
retain index;
if first.variable then index+1;
run;

注意:我创建了索引并仅使用新的组值增加其值。

答案 1 :(得分:0)

您可以创建一个类似于:

的新表
    select distinct variable, montonic() as newindex from mydata order by index;

然后将其重新加入原始表格。事实上,你可以一步到位地完成这一切:

    select a.variable, a.new_variable, a.type, a.start, a.end, a.woe, b.newindex as index from mydata as a left join (select distinct variable, montonic() as newindex from my table order by index) as b on  a.variable=b.variable;

或者沿着这些方向的东西。我不能说我100%理解你想要实现的目标,但也许这会有所帮助。

请注意,SAS中的monotonic()函数仍然(我相信)没有记录。所有这些意味着SAS可能会也可能不会继续包含它。它有效,但也许他们认为它是实验性的。

答案 2 :(得分:0)

考虑使用Postgres'dense_rank窗口函数进行无间隙排名:

SELECT *, DENSE_RANK() OVER (ORDER BY variable) as "index"
FROM mytable

Rextester demo (随机播种数据)