Question

我有一个看起来像这样的MySql表：

Column1   Column2   Column3   DateTime
14         10         15      2015-01-01 21:45:00
0          0          0       2015-01-01 21:46:00
12         8          16      2015-01-015 21:46:30
13         7          15      2015-01-01 21:47:00
0          0          0       2015-01-01 21:48:10
.          .          .       .
.          .          .       .
.          .          .       .
// Many non-zero rows here
.          .          .       .

14         10         15      2015-01-02 20:04:00
0          0          0       2015-01-02 20:04:30
12         8          16      2015-01-02 20:04:40
0          0          0       2015-01-02 20:04:50
10         5          2       2015-01-02 20:04:55
0          0          0       2015-01-02 20:05:00
11         4          8       2015-01-02 20:05:05
0          0          0       2015-01-02 20:05:10
12         15         16      2015-01-02 20:05:30
.          .          .       .
.          .          .       .
.          .          .       .
// Many other rows here where zeros doesn't happen so often.

这表示在给定时刻用户的互联网连接的质量。全零行意味着连接被丢弃（注意，在给定行中只有一列不会有零值 - 要么它们全部为零，要么全部为非零）。这意味着，从该示例数据来看，该用户的最差时段是从2015-01-02 20:04:30到2015-01-02 20:05:30，因为连接在一分钟内下降了四次。如何在C＃（或mysql，如果它不是太麻烦）中找到这段时间？

顺便说一句，这是否有特定的名称？我没有多少运气在谷歌搜索它的解决方案，因为我发现的大多数问题是试图找到最长的条纹（只有连续出现，而不是最接近的，如我需要找到的），或类似的东西这一点。

更新：我今天要和我的一位CS老师谈谈，但我无法做到。我明天会跟他说话。与此同时，一些朋友和我一直在思考如何解决这个问题，而且我们已经找到了一些我们不确定它是否是正确解决方案的东西（正如你所看到的，我们＆＃39;数学/统计学不是很好。我们考虑过这样做：

对于表中的每个全零列，我们将它与最后一列以及之前的所有其他列进行比较。像这样：

Compare first all-zero row with last one; 
Compare first all-zero row with second last one;
 ... 
Compare first all-zero row with the second one. 
Do it all over again starting on the second all-zero row this time.

然后我们得到此用户连接的最差时间间隔，这是(Number of times the connection dropped in time interval T) divided by T具有更大值的最差时间间隔。但是，正如我之前所说，我们甚至不知道这是否会给我们正确答案。此外，这似乎在计算上相当昂贵，现在我们有一个包含几千行的数据库。

Answer 1

好的，有更多时间考虑这个问题。在面向对象的伪代码中考虑这个问题相当容易，因为它基本上归结为在数组中找到最大值：

int timeInterval = 30 (or however many seconds you want)
Sort all rows in ascending date/time order
Row worstStartRow = rows[0]
int worstNumBadConnections = 0
For each row X
    If X is defined as a dropped connection
        int tempNumBadConnections = 0
        For every subsequent row Y
            If (Y.time - X.time) > timeInterval
                break
            Else if Y is defined as a dropped connection
                tempNumBadConnections++
        If tempNumBadConnections > worstNumBadConnections
            worstNumBadConnections = tempNumBadConnections
            worstStartRow = X
// worst time interval starts at worstStartRow.time,
// ends at worstStartRow.time + timeInterval

但当然SQL并不能很好地进行行处理。为了解决这个问题，我们可以将表连接到自身，确保来自我们的两个表的配对行的时间＆＃34;落在一定范围内，并汇总输出。

假设我们有一个表Demo，如下所示：

Id  Zero  Time
0   0     '2007-12-31 11:11:11'
1   0     '2008-01-01 00:00:00'
2   0     '2008-01-01 00:00:30'
3   1     '2008-01-01 00:00:30'
4   0     '2008-01-01 00:00:31'
5   1     '2008-01-01 00:00:31'
6   0     '2008-01-01 00:00:32'
7   0     '2008-01-01 11:11:11'

对于row.Zero = 0的每一行，我们要查找Zero = 0的所有行，另一行的时间不超过第一行的N秒。因此，如果您的间隔为30秒，则查询可能如下所示：

SELECT a.Id, a.Time, b.Id, b.Time
FROM Demo a
INNER JOIN Demo b
  ON a.Zero = b.Zero
  AND a.Time <= b.Time
WHERE a.Zero = 0
  AND TIMESTAMPDIFF(SECOND, a.Time, b.Time) <= 30
ORDER BY a.Id, b.Time
;

这给了我们一系列行，其中包含1）定义间隔开始的零行的Id，2）该起始行的时间，3）该间隔中另一行的Id，以及4）那一行的时间：

Id  Time                            Id  Time
0   'December, 31 2007 11:11:11'    0   'December, 31 2007 11:11:11'
1   'January, 01 2008 00:00:00'     1   'January, 01 2008 00:00:00'
1   'January, 01 2008 00:00:00'     2   'January, 01 2008 00:00:30'
2   'January, 01 2008 00:00:30'     2   'January, 01 2008 00:00:30'
2   'January, 01 2008 00:00:30'     4   'January, 01 2008 00:00:31'
2   'January, 01 2008 00:00:30'     6   'January, 01 2008 00:00:32'
4   'January, 01 2008 00:00:31'     4   'January, 01 2008 00:00:31'
4   'January, 01 2008 00:00:31'     6   'January, 01 2008 00:00:32'
6   'January, 01 2008 00:00:32'     6   'January, 01 2008 00:00:32'
7   'January, 01 2008 11:11:11'     7   'January, 01 2008 11:11:11'

我们可以看到完全排除非零行，并且如果行的时间比第一行晚0到30秒（包括），则行仅匹配起始行。到现在为止还挺好！但我们也想通过起始行的Id计算这些结果。因此，我们将使查询聚合结果，如下所示：

SELECT a.Id, a.Time, COUNT(b.Id) numDropped
FROM Demo a
INNER JOIN Demo b
  ON a.Time <= b.Time
  AND a.Zero = b.Zero
WHERE a.Zero = 0
  AND TIMESTAMPDIFF(SECOND, a.Time, b.Time) <= 30
GROUP BY a.Id
;

这给我们的行包含1）定义间隔开始的零行的Id，2）该起始行的时间，以及3）间隔中的零行数，包括起始行：

Id  Time                            numDropped
0   'December, 31 2007 11:11:11'    1
1   'January, 01 2008 00:00:00'     2
2   'January, 01 2008 00:00:30'     3
4   'January, 01 2008 00:00:31'     2
6   'January, 01 2008 00:00:32'     1
7   'January, 01 2008 11:11:11'     1

为了得到最差的＆＃34;，我们可以简单地采用先前的查询，按numDropped降序排序，并获得第一行：

ORDER BY numDropped DESC
LIMIT 1
;

这给了我们：

Id  Time                            numDropped
2   'January, 01 2008 00:00:30'     3

您现在拥有最差间隔的开始时间，以及第一次连接尝试的ID和该间隔中断开的连接数！如果您希望在查询中返回最差时间间隔的结束时间（而不是在消费程序中计算），则可以在SELECT上另外a.Time + INTERVAL 30 SECOND。再次，换掉30，不管你的间隔应该多长几秒。

一些快速的旁注：

1）您会注意到零行自身加入，这与之前处理后续行的概念并不相符。但是我们需要这个 - 因为如果最差的间隔只有一个掉线连接怎么办？因此，每个零行都需要能够将自己包含在附近零行的列表中。

2）加入a.Time <= b.Time可以避免创建我们知道无论如何都不想要的重复联接行，因此查询不必浪费时间来处理它们。但是，您可以删除该子句并使用更明确的TIMESTAMPDIFF(SECOND, a.Time, b.Time) BETWEEN 0 AND 30替换时间戳检查，您将得到相同的结果。

Answer 2

如果你想找到连接最差的半分钟日历，那么很容易就是聚合查询。像这样：

select FROM_UNIXTIME(floor(UNIX_TIMESTAMP(datetime) / (30))) as periodstart,
       count(*) as numrows,
       sum(column1 = 0 and column2 = 0 and column3 = 0) as numallzeros
from table t
group by floor(UNIX_TIMESTAMP(datetime) / (30))
order by numallzeros desc;

如果你想灵活地定义时期，那就更难了。如果是这样，你需要在问题中解释应该如何做。

查找出现次数最多的日期范围

2 个答案: