我不是SQL的新手,但我是PostgreSQL的新手,我真的很难在不同的环境中调整我现有的知识。
我正在尝试创建一个变量,用于捕获某人是否在0/1时间序列变量中保留active
,skips
或churns
。例如,在下面的数据中,我的数据集将包含变量id
,time
和voted
,我会创建变量“skipped”:
id time voted skipped
1 1 1 active
1 2 0 skipped
1 3 1 active
2 1 1 active
2 2 0 churned
2 3 0 churned
3 1 1 active
3 2 1 active
3 3 0 churned
编码“跳过”的规则非常简单:如果1是最后一条记录,则该人处于“活动状态”且任何零都计为“跳过”,但如果0是最后一条记录,则该人员被“搅动”
id = 1
的记录是跳过的,因为id
在时间2为0时在时间3处为非零。其他两种情况,0为最终值,因此它们被“搅拌” 。有人可以帮忙吗?我整天都在盯着它,正在撞墙。
答案 0 :(得分:2)
这不是特别优雅,但它应该满足您的需求:
Map<RiskItemDTO, List<RiskItem> itemsByRisk = repo.findRiskItemsByRiskTypeName(riskTypeName)
.stream()
.map(mapper::mapToDTO)
.collect(groupingBy(RiskItemDTO::getRisk));
List<RiskWithRiskItemsDTO> list = itemsByRisk.entrySet().stream()
.map( entry -> new RiskWithRiskItemsDTO(entry.getKey(),entry.getValue()))
.collect(Collectors.toList());
简而言之,我们首先弄清楚哪个是每个选民ID的最后一条记录,然后我们在结果表上进行自联接以仅隔离最后一个id。
这可能会产生多重结果 - 如果可以同时进行两次相同的ID投票。如果是这种情况,您需要with votes as (
select
id, time, voted,
max(time) over (partition by id) as max_time
from voter_data
)
select
v1.id, v1.time, v1.voted,
case
when v1.voted = 1 then 'active'
when v2.voted = 1 then 'skipped'
else 'churned'
end as skipped
from
votes v1
join votes v2 on
v1.id = v2.id and
v1.max_time = v2.time
而不是row_number()
。
您的数据结果:
max()
答案 1 :(得分:1)
Window functions可以帮助提高可读性。
WITH
add_last_voted_status AS (
SELECT
*
, LAST_VALUE(voted) OVER (
PARTITION BY id
ORDER BY time
) AS last_voted_status
FROM table
)
SELECT
id
, time
, voted
, CASE
WHEN last_voted_status = 0
THEN 'churned'
WHEN last_voted_status = 1 AND voted = 1
THEN 'active'
WHEN last_voted_status = 1 AND voted = 0
THEN 'skipped'
ELSE '?'
END AS skipped
FROM add_last_voted_status