我想使用现有的等级和二进制列创建等级列。例如,假设一个表具有ID,RISK,CONTACT,DATE。现有等级为RISK,例如1,2,3,NULL,其中3为最高。二进制值为CONTACT,值为0.1,或FAILURE / SUCESS。我想创建一个新的RANK,一旦超过一定数量的成功联系人,它将按风险排序。
例如,假设约束条件是最少2个成功联系人。然后,应在以下两个实例中按以下方式创建等级:
实例1.三个ID,每个都有一个至少两个成功联系人。在这种情况下,排名反映了风险:
ID risk contact date rank
1 3 S 1 3
1 3 S 2 3
1 3 F 3 3
1 3 F 4 3
2 2 S 1 2
2 2 S 2 2
2 2 F 3 2
2 2 F 4 2
3 1 S 1 1
3 1 S 2 1
3 1 S 3 1
实例2。假设ID = 1只有一个成功的联系人。在那种情况下,它降级到最低等级,rank = 1,而ID = 2获得最高值,rank = 3,ID = 3映射到rank = 2,因为它满足约束条件,但风险值比ID低= 2:
ID risk contact date rank
1 3 S 1 1
1 3 F 2 1
1 3 F 3 1
1 3 F 4 1
2 2 S 1 3
2 2 S 2 3
2 2 F 3 3
2 2 F 4 3
3 1 S 1 2
3 1 S 2 2
3 1 S 3 2
这是SQL,特别是Hive。预先感谢。
编辑-我认为Gordon Linoff的代码可以正确执行。最后,我使用了三个临时表。代码如下所示:
首先
--numerize risk, contact
select A.* ,
case when A.risk = 'H' then 3
when A.risk = 'M' then 2
when A.risk = 'L' then 1
when A.risk is NULL then NULL
when A.risk = 'NULL' then NULL
else -999 end as RISK_RANK,
case when A.contact = 'Successful' then 1
else NULL end as success
第二,
-- sum_successes_by_risk
select A.* ,
B.sum_successes_by_risk
from T as A
inner join
(select A.person, A.program, A.risk, sum(a.success) as sum_successes_by_risk
from T as A
group by A.person, A.program, A.risk
) as B
on A.program = B.program
and A.person = B.person
and A.risk = B.risk
第三,
--Create table that contains only max risk category
select A.* ,
B.max_risk_rank
from T as A
inner join
(select A.person, max(A.risk_rank) as max_risk_rank
from T as A
group by A.person
) as B
on A.person = B.person
and A.risk_rank = B.max_risk_rank
答案 0 :(得分:0)
这很难遵循,但我认为您只需要窗口函数:
$json_object.fields.PSObject.Properties.Remove("field_four")