如何在SQL中对重叠数据进行分组

时间:2012-12-31 11:28:04

标签: sql vertica

我有以下方式的数据

Prog_Id Low_latency  Max_Latency
a        1            4
a       -1            5
a        3            8
a       11           12
a       12           15

现在我希望将输出视为

Prog_Id  Low_latency   Max_Latency
a          -1            8
a          11            15

基本上我希望合并重叠数据。任何人都可以帮我解决这些问题。如果有OVERLAPS子句的解决方案,我可以在延迟的地方管理时间。

由于 RISHABH

3 个答案:

答案 0 :(得分:1)

我最初的答案并不总是有效。现在它看起来像是:

select distinct *
from (
   select
     t1.Prog_ID,
     min(least(l, Low_latency)),
     max(greatest(g, Max_Latency))
   from yourtable t1 inner join (select
                                   t1.Prog_ID,
                                   least(t1.Low_latency, t2.Low_latency) l,
                                   greatest(t1.Max_Latency, t2.Max_Latency) g
                                 from
                                   yourtable t1 inner join yourtable t2
                                   on t1.Prog_ID=t2.Prog_ID
                                      and t1.Low_latency<=t2.Max_Latency
                                      and t1.Max_Latency>=t2.Low_Latency) t2
     on t1.Prog_ID=t2.Prog_ID
        and t1.Low_latency<=t2.g
        and t1.Max_Latency>=t2.l
   group by t1.Low_latency, t1.Max_latency) s

请参阅here。它是MySql代码,但可以转换为其他DBMS。

答案 1 :(得分:0)

这取决于您使用的数据库服务器(DBMS)。但是没有简单的解决方案。可以使用Stored procedures。但我更喜欢用编程语言(你使用哪种语言?)

在对其他人的查询进行一些测试后,我发现SQL没办法。

这里有一些类似于在java中映射reduce的东西

public class YourData {
    Double Low_latency;
    Dobule Max_Latency;
    int Prog_Id;

    // add getter and setter here

    public boolean tesetOverlapping(YourData data) {
        if ((this.Low_latency<=data.Low_latency && data.Low_latency<=t1.Max_Latency) ¦¦ (this.Low_latency<=data.Max_Latency && data.Max_Latency<=this.Max_Latency)) {
                this.Low_latency = Math.min(this.Low_latency, data.Low_latency);
                this.Max_Latency = Math.min(this.Max_Latency, data.Max_Latency);

                return true
        }

        return false;
    }
}

String sql = "
  SELECT
    t1.Prog_Id,
    t1.Low_latency,
    t1.Max_Latency
  FROM yourtable t1"

ArrayList<ArrayList<Double>> values = new ArrayList<ArrayList<Double>>();

while (row = get sql rows) {

    int progIndex = values.indexOf(row.Prog_Id);

    if (progIndex == -1) {
        progIndex = values.indexOf(row.Prog_Id);

        values.add(progIndex, new ArrayList<Double>());
    }

    values[progIndex].add(new YourData(row));
}  

boolean foundOverlapping = false;
for (int progIndex = 0; progIndex < values.size(); progIndex++) {
    // Do map reduce for each progIndex
    do {
        foundOverlapping = false;
        for (int i = 0; i < values[progIndex].size(); i++) {
            if (!values[progIndex].contains(i)) {
                continue;
            }

            YourData cur = values[progIndex][i];

            for (int x = 0; x < values[progIndex].size(); x++) {
                if (i != x && values[progIndex].contains(x)) {
                    if (cur.tesetOverlapping(values[progIndex][x])) {
                        foundOverlapping = true;
                        values[progIndex].remove(x);
                    }
                }
            }
        }

    } while (foundOverlapping == true);
}

答案 2 :(得分:-1)

假设您希望以-infinity...910...1920...29模式进行分组以获得较低的延迟,则需要类似

的内容
SELECT
  Prog_Id,
  MIN(Low_latency) AS Low_latency,
  MAX(Max_Latency) AS Max_Latency
FROM
  your_table_name
GROUP BY
  Prog_Id,
  IF(FLOOR(Low_latency/10)<0,0,FLOOR(Low_latency/10))

显然最后一行将取决于所使用的RDBMS,但在大多数情况下应该非常相似。

您可能还想添加ORDER BY子句。