我正在尝试复制下面的示例输出表,但不确定如何继续。我尝试过使用LAG功能,但效果有限
CASE WHEN LAG(mode, 1) OVER(PARTITION BY Cluster_Name, Node_Name) ORDER BY date
!= mode
THEN date
它会找到模式切换的日期,但是我不确定如何复制示例输出表
一个周期在记录第一个气体模式/体积时开始,在记录最后一个水量时结束。即:对于Foxtrot,即使在3/18/2019上有水位,也只会发生1个周期。这是因为在此之前没有发生气体循环。
原始数据
| Cluster_Name | Node_Name | Mode | volume | date | *Annotation Only*|
|--------------|-----------|-------|--------|-----------|------------------|
| Cluster A | Foxtrot | water | 100 | 3/18/2019 |
| Cluster A | Foxtrot | gas | 200 | 3/19/2019 | Cycle 1
| Cluster A | Foxtrot | gas | 200 | 3/20/2019 |
| Cluster A | Foxtrot | water | 100 | 3/21/2019 |
| Cluster B | Alpha | water | 820 | 4/29/2018 |
| Cluster B | Alpha | gas | 500 | 4/30/2018 | Cycle 1
| Cluster B | Alpha | gas | 500 | 5/1/2018 |
| Cluster B | Alpha | gas | 500 | 5/2/2018 |
| Cluster B | Alpha | water | 1,000 | 5/3/2018 |
| Cluster B | Alpha | water | 1,000 | 5/4/2018 |
| Cluster B | Alpha | water | 1,000 | 5/5/2018 |
| Cluster B | Alpha | gas | 300 | 5/6/2018 | Cycle 2
| Cluster B | Alpha | gas | 300 | 5/7/2018 |
| Cluster B | Alpha | water | 2,000 | 5/8/2018 |
| Cluster B | Alpha | gas | 300 | 5/9/2018 | Cycle 3
| Cluster B | Alpha | water | 2,000 | 5/10/2018 |
| Cluster B | Alpha | gas | 1,500 | 5/11/2018 | Cycle 4
示例输出表
此表充当一种数据透视表,通过卷的总和聚集在集群/节点/周期号上。
| Cluster_Name | Node_Name | Mode | Total_Volume | Cycle # |
|--------------|-----------|-------|--------------|---------|
| Cluster A | Foxtrot | gas | 400 | Cycle 1 |
| Cluster A | Foxtrot | water | 100 | Cycle 1 |
| Cluster B | Alpha | gas | 1,500 | Cycle 1 |
| Cluster B | Alpha | water | 3,000 | Cycle 1 |
| Cluster B | Alpha | gas | 600 | Cycle 2 |
| Cluster B | Alpha | water | 2,000 | Cycle 2 |
| Cluster B | Alpha | gas | 300 | Cycle 3 |
| Cluster B | Alpha | water | 1,200 | Cycle 3 |
| Cluster B | Alpha | gas | 1,500 | Cycle 4 |
答案 0 :(得分:1)
假设您使用的是Oracle或MySQL 8 *(假设您说过要使用LAG()
,而旧版本的MySQL没有LAG()
)
还假设只有两种模式,并且您永远不需要第一种模式的任何读数。
WITH
sorted_data AS
(
SELECT
rawdata.*,
ROW_NUMBER() OVER (PARTITION BY cluster_name, node_name
ORDER BY date
)
AS node_seq_num,
ROW_NUMBER() OVER (PARTITION BY cluster_name, node_name, mode
ORDER BY date
)
AS node_mode_seq_num
FROM
your_data
),
aggregated_data AS
(
SELECT
cluster_name,
node_name,
mode,
MIN(date) AS first_date,
MAX(date) AS final_date,
SUM(volume) AS total_volume,
ROW_NUMBER() OVER (PARTITION BY cluster_name, node_name, mode, node_seq_num - node_mode_seq_num
ORDER BY node_seq_num - node_mode_seq_num
)
AS node_mode_group_seq_num
FROM
sorted_data
GROUP BY
cluster_name,
node_name,
mode,
node_seq_num - node_mode_seq_num
)
SELECT
*,
node_mode_group_seq_num / 2 AS cycle_num
FROM
aggregated_data
WHERE
node_mode_group_seq_num > 1
ORDER BY
cluster_name,
node_name,
mode,
node_mode_group_seq_num
(在MySQL中,您需要TRUNC(node_mode_group_seq_num / 2)
)
答案 1 :(得分:0)
您似乎想要lag()
,如下所示:
select rd.*
from (select rd.*,
lag(mode) over (partition by cluster_name, mode_name order by date) as prev_mode
from rawdata rd
) rd
where prev_mode <> mode;
通常,在执行此操作时,我也希望获得第一条记录。那应该是:
where prev_mode is null or prev_mode <> mode