识别SQL中的模式。有可能吗?

时间:2019-03-21 14:57:46

标签: mysql sql oracle

我正在尝试复制下面的示例输出表,但不确定如何继续。我尝试过使用LAG功能,但效果有限

CASE WHEN LAG(mode, 1) OVER(PARTITION BY Cluster_Name, Node_Name) ORDER BY date != mode THEN date

它会找到模式切换的日期,但是我不确定如何复制示例输出表

一个周期在记录第一个气体模式/体积时开始,在记录最后一个水量时结束。即:对于Foxtrot,即使在3/18/2019上有水位,也只会发生1个周期。这是因为在此之前没有发生气体循环。

原始数据

| Cluster_Name | Node_Name | Mode  | volume | date      | *Annotation Only*|
|--------------|-----------|-------|--------|-----------|------------------|         
| Cluster A    | Foxtrot   | water | 100    | 3/18/2019 |                            
| Cluster A    | Foxtrot   | gas   | 200    | 3/19/2019 | Cycle 1                    
| Cluster A    | Foxtrot   | gas   | 200    | 3/20/2019 |                            
| Cluster A    | Foxtrot   | water | 100    | 3/21/2019 |                            
| Cluster B    | Alpha     | water | 820    | 4/29/2018 |                            
| Cluster B    | Alpha     | gas   | 500    | 4/30/2018 | Cycle 1                    
| Cluster B    | Alpha     | gas   | 500    | 5/1/2018  |                            
| Cluster B    | Alpha     | gas   | 500    | 5/2/2018  |                            
| Cluster B    | Alpha     | water | 1,000  | 5/3/2018  |                            
| Cluster B    | Alpha     | water | 1,000  | 5/4/2018  |                            
| Cluster B    | Alpha     | water | 1,000  | 5/5/2018  |                            
| Cluster B    | Alpha     | gas   | 300    | 5/6/2018  | Cycle 2                    
| Cluster B    | Alpha     | gas   | 300    | 5/7/2018  |                            
| Cluster B    | Alpha     | water | 2,000  | 5/8/2018  |                            
| Cluster B    | Alpha     | gas   | 300    | 5/9/2018  | Cycle 3                    
| Cluster B    | Alpha     | water | 2,000  | 5/10/2018 |                            
| Cluster B    | Alpha     | gas   | 1,500  | 5/11/2018 | Cycle 4                    

示例输出表

此表充当一种数据透视表,通过卷的总和聚集在集群/节点/周期号上。

| Cluster_Name | Node_Name | Mode  | Total_Volume | Cycle # |
|--------------|-----------|-------|--------------|---------|
| Cluster A    | Foxtrot   | gas   | 400          | Cycle 1 |
| Cluster A    | Foxtrot   | water | 100          | Cycle 1 |
| Cluster B    | Alpha     | gas   | 1,500        | Cycle 1 |
| Cluster B    | Alpha     | water | 3,000        | Cycle 1 |
| Cluster B    | Alpha     | gas   | 600          | Cycle 2 |
| Cluster B    | Alpha     | water | 2,000        | Cycle 2 |
| Cluster B    | Alpha     | gas   | 300          | Cycle 3 |
| Cluster B    | Alpha     | water | 1,200        | Cycle 3 |
| Cluster B    | Alpha     | gas   | 1,500        | Cycle 4 |

2 个答案:

答案 0 :(得分:1)

假设您使用的是Oracle或MySQL 8 *(假设您说过要使用LAG(),而旧版本的MySQL没有LAG()

还假设只有两种模式,并且您永远不需要第一种模式的任何读数。

WITH
    sorted_data AS
(
    SELECT
        rawdata.*,
        ROW_NUMBER() OVER (PARTITION BY cluster_name, node_name
                               ORDER BY date
                          )
                             AS node_seq_num,
        ROW_NUMBER() OVER (PARTITION BY cluster_name, node_name, mode
                               ORDER BY date
                          )
                             AS node_mode_seq_num
    FROM
        your_data
),
   aggregated_data AS
(
    SELECT
        cluster_name,
        node_name,
        mode,
        MIN(date)   AS first_date,
        MAX(date)   AS final_date,
        SUM(volume) AS total_volume,
        ROW_NUMBER() OVER (PARTITION BY cluster_name, node_name, mode, node_seq_num - node_mode_seq_num
                               ORDER BY node_seq_num - node_mode_seq_num
                          )
                             AS node_mode_group_seq_num
    FROM
        sorted_data
    GROUP BY
        cluster_name,
        node_name,
        mode,
        node_seq_num - node_mode_seq_num
)
SELECT
    *,
   node_mode_group_seq_num / 2   AS cycle_num
FROM
    aggregated_data
WHERE
    node_mode_group_seq_num > 1
ORDER BY
    cluster_name,
    node_name,
    mode,
    node_mode_group_seq_num

(在MySQL中,您需要TRUNC(node_mode_group_seq_num / 2)

答案 1 :(得分:0)

您似乎想要lag(),如下所示:

select rd.*
from (select rd.*,
             lag(mode) over (partition by cluster_name, mode_name order by date) as prev_mode
      from rawdata rd
     ) rd
where prev_mode <> mode;

通常,在执行此操作时,我也希望获得第一条记录。那应该是:

where prev_mode is null or prev_mode <> mode