我想添加列“index”,当列“variable”中的行具有相同值时,列“index”的值相同。您可以使用Postgresql或SAS语法。
有一点是“变量”列中的值每天都在变化,例如tableA和tableB,所以硬代码是不可接受的。感谢任何建议!
+----------+--------------+------+-------+-----+-----+-------+
| variable | new_variable | type | start | end | woe | index |
+----------+--------------+------+-------+-----+-----+-------+
| A | mi_A | char | 1 | | 1.3 | 1 |
| A | mi_A | char | 0 | | 0.6 | 1 |
| B | mi_B | char | 1 | | 5.4 | 2 |
| B | mi_B | char | 0 | | 0.1 | 2 |
| gnd_cd | gnd_cd | char | 3 | | 1.3 | 3 |
| gnd_cd | gnd_cd | char | @0 | | 0.6 | 3 |
| gnd_cd | gnd_cd | char | 2 | | 5.4 | 3 |
| gnd_cd | gnd_cd | char | N | | 0.1 | 3 |
| gnd_cd | gnd_cd | char | 1 | | 1.3 | 3 |
| gnd_cd | gnd_cd | char | 99 | | 0.6 | 3 |
| mar_sign | mar_sign | char | 0 | | 5.4 | 4 |
| mar_sign | mar_sign | char | Y | | 0.1 | 4 |
| mar_sign | mar_sign | char | N | | 6 | 4 |
| C | C | char | 6 | | 2 | 5 |
| C | C | char | 7 | | 2.1 | 5 |
| C | C | char | 8 | | 2.2 | 5 |
+----------+--------------+------+-------+-----+-----+-------+
(tableA)
+--------------+--------------+------+-------+-----+-----+-------+
| variable | new_variable | type | start | end | woe | index |
+--------------+--------------+------+-------+-----+-----+-------+
| D | mi_D | char | 1 | | 1 | 1 |
| D | mi_D | char | 0 | | 2 | 1 |
| E | mi_E | char | 1 | | 2 | 2 |
| E | mi_E | char | 0 | | 3 | 2 |
| education_bg | education_bg | char | 3 | | 1 | 3 |
| education_bg | education_bg | char | @0 | | 5 | 3 |
| education_bg | education_bg | char | 2 | | 6 | 3 |
| education_bg | education_bg | char | N | | 4 | 3 |
| education_bg | education_bg | char | 1 | | 3 | 3 |
| education_bg | education_bg | char | 99 | | 3 | 3 |
| sex | sex | char | 0 | | 2 | 4 |
| sex | sex | char | Y | | 1 | 4 |
| sex | sex | char | N | | 0 | 4 |
| C | C | char | 6 | | 6 | 5 |
| C | C | char | 7 | | 4 | 5 |
| C | C | char | 8 | | 1 | 5 |
+--------------+--------------+------+-------+-----+-----+-------+
(tableB)
答案 0 :(得分:2)
您可以使用保留功能在单个datastep中在SAS中执行此操作,并按变量进行分组。
代码:
data have;
infile datalines dlm='|';
input variable $ new_variable $ type $ start $ end $ woe ;
datalines;
| A | mi_A | char | 1 | | 1.3
| A | mi_A | char | 0 | | 0.6
| B | mi_B | char | 1 | | 5.4
| B | mi_B | char | 0 | | 0.1
| gnd_cd | gnd_cd | char | 3 | | 1.3
| gnd_cd | gnd_cd | char | @0 | | 0.6
| gnd_cd | gnd_cd | char | 2 | | 5.4
| gnd_cd | gnd_cd | char | N | | 0.1
| gnd_cd | gnd_cd | char | 1 | | 1.3
| gnd_cd | gnd_cd | char | 99 | | 0.6
| mar_sign | mar_sign | char | 0 | | 5.4
| mar_sign | mar_sign | char | Y | | 0.1
| mar_sign | mar_sign | char | N | | 6
| C | C | char | 6 | | 2
| C | C | char | 7 | | 2.1
| C | C | char | 8 | | 2.2
run;
data want;
set have ;
by variable notsorted;
retain index;
if first.variable then index+1;
run;
注意:我创建了索引并仅使用新的组值增加其值。
答案 1 :(得分:0)
您可以创建一个类似于:
的新表 select distinct variable, montonic() as newindex from mydata order by index;
然后将其重新加入原始表格。事实上,你可以一步到位地完成这一切:
select a.variable, a.new_variable, a.type, a.start, a.end, a.woe, b.newindex as index from mydata as a left join (select distinct variable, montonic() as newindex from my table order by index) as b on a.variable=b.variable;
或者沿着这些方向的东西。我不能说我100%理解你想要实现的目标,但也许这会有所帮助。
请注意,SAS中的monotonic()函数仍然(我相信)没有记录。所有这些意味着SAS可能会也可能不会继续包含它。它有效,但也许他们认为它是实验性的。
答案 2 :(得分:0)
考虑使用Postgres'dense_rank
窗口函数进行无间隙排名:
SELECT *, DENSE_RANK() OVER (ORDER BY variable) as "index"
FROM mytable
Rextester demo (随机播种数据)