Question

我正在尝试收集有关数据的简单统计信息，例如服务在线时间，服务离线时间，平均值等等。我已经找到了一些解决方案，但它们都依赖于某些事情，例如行背靠背（ROW_NUMBER -1）或只有两个状态。

我的数据以日志的形式出现，总是在事后（即没有实时数据）。我要弄清楚的最大问题是有两个以上的州。目前，有可能有四种不同的状态（启用，禁用，活动，非活动），我希望能够收集每种状态的数据。

我一次只提供一行数据，其中包含服务名称，旧状态，新状态和时间戳。目前，数据存储在单个表中。我无法改变数据的提供方式，但我可以改变它的存储方式，并且我开始认为该表是我的主要退款。

以下是当前数据如何在我的表中结束的示例：

CREATE TABLE IF NOT EXISTS statusupdates (
  sid int UNIQUE, 
  fullname VARCHAR(64), 
  oldstatus VARCHAR(16), 
  newstatus VARCHAR(16), 
  time TIMESTAMP);

INSERT INTO statusupdates VALUES
(null, 'fictHTTP', 'Off', 'On', '2017-01-01 02:20:00'),
(null, 'faked', 'On', 'Inactive', '2017-01-01 02:25:00'),
(null, 'ipsum', 'Inactive', 'On', '2017-01-01 02:30:00'),
(null, 'resultd', 'On', 'Inactive', '2017-01-01 02:35:00'),
(null, 'ipsum', 'On', 'Active', '2017-01-01 02:40:00'),
(null, 'fictHTTP', 'On', 'Active', '2017-01-01 02:45:00'),
(null, 'faked', 'Inactive', 'Off', '2017-01-01 02:50:00'),
(null, 'ipsum', 'Active', 'Off', '2017-01-01 02:55:00'),
(null, 'resultd', 'Inactive', 'Off', '2017-01-01 03:00:00');

我相信我发现的一种方法是将其缩小到一个项目，例如resultd。像SELECT fullname, newstatus, time FROM statusupdates WHERE fullname='resultd' ORDER BY time DESC;这样的东西。然后使用该数据，使用相同的方法执行另一个查询，但前进一步（因为它的降序）并从该记录中获取newstatus。当我输入时，它似乎很草率。

或者抓取oldstatus，然后在第二个查询中，使用它来查找以下记录的newstatus。但同样，这可能是草率的。

我知道有一种方法可以将这两个理论查询结合起来。总而言之，我远远超过我的头脑，请原谅我！最后，我想查看每个状态的总时间，平均时间等统计信息。我现在最大的障碍是获取查询以提供结果，例如，ipsum的每个时间戳条目，以便我可以从之前的条目获得持续时间，再重复此操作，直到它为止。经历了所有的记录。

或许，或许，我完全不在考虑这个问题，并且通过将所有数据推入一个表格来使其过于复杂 - 到目前为止，我已经针对不相关的项目对该项目进行了两次。< / p>

额外的想法：单个实例，我可以做SELECT old_status, new_status, time FROM statusupdates WHERE time = '2017-01-01 03:00:00'然后我可以像这样使用old_status，SELECT old_status, new_status, time FROM statusupdates WHERE time < 'timeStamp' AND new_status = 'oldStatus'然后减去两个时间戳，这将给我一个例子的数据。但是，然后如何为下一步和下一步做到这一点，直到它击中所有它们。

更新，另一个想法：结合您的一些很棒的建议，如何向后阅读日志呢？ 没关系，在这一点上，阅读它们的方向并不重要。当遇到状态时，创建一个不完整的记录。它将包含old_status和time_stamp作为end_time。然后当它再次遇到该服务时，它会检查new_status = old_status并使用time_stamp更新记录为start_time。

这似乎会导致很多开销。必须检查每条记录是否存在，如果不存在，如果是，则更新一条记录。或许这不是太糟糕了？

Answer 1

您是否可以访问数据库中的窗口函数？如果是这样，您可以获取每条记录的下一行的值（按全名分区）：

  select  fullname,
          newstatus,
          avg( time_diff ) as avg_time
  from    (
            select  fullname,
                    oldstatus,
                    newstatus,
                    /* get the time value of the next row for this fullname record */
                    lead( time ) over( 
                      partition by fullname 
                      order by time 
                      rows between 1 following and 1 following 
                    ) as next_time,
                    time,
                    next_time - time as time_diff
            from    statusupdates
          ) as a
   group by fullname,
          newstatus

修改

如果没有窗口函数，您可以稍微复杂一点地获取next_time：

select a.*, b.next_time from statusupdates as a left join ( select a.fullname, a.time, min( b.time ) as next_time from statusupdates as a left join statusupdates as b on a.fullname = b.fullname and a.time < b.time group by a.fullname, a.time ) as b on a.fullname = b.fullname and a.time = b.time ;

Answer 2

您可以重新考虑此数据结构

statusUpdate {
  fullName,
  oldStatus,
  newStatus,
  startTime,
  endTime
}

现在您可以轻松拍摄SQL查询以获取统计信息：例子

select sum(endTime - startTime) from statusUpdate where oldStatus='active' group by fullName

如果您无法控制数据库，那么您可以在内存中创建一个，但如果此数据量很大，则会非常昂贵。

修改

到目前为止，Alex的解决方案似乎是最好的，但如果数据库完全不受您控制，您可以尝试在解析日志文件时构建您的统计信息，因为日志文件保证列出按时间排序的记录。这可能会占用更少的内存空间，并且可以进一步微调。

public class Aggregation { String fullName; String prevStatus; String currStatus; Date prevTime; Date currTime; Map<String, List<Long>> timePeriodListMap = new HashMap<>(); Map<String, Long> totalTimeMap = new HashMap<>(); public void add(Status status) { if(!fullName.equals(status.fullName)) { throw new RuntimeException("Wrong "+fullName); } if(!currStatus.equals(status.oldStatus)) { throw new RuntimeException("Previous record's newStatus is not this record's oldStatus"); } if(prevTime.compareTo(status.time) > 0){ throw new RuntimeException("Unsorted by time"); } if(currTime == null) { fullName = status.fullName; prevTime = status.time; } else { if(!timePeriodListMap.containsKey(prevStatus)) { timePeriodListMap.put(prevStatus, new ArrayList<Long>()); } timePeriodListMap.get(prevStatus).add(status.time.getTime() - currTime.getTime()); prevTime = currTime; currTime = status.time; } prevStatus = status.oldStatus; currStatus = status.newStatus; } } Map<String, Aggregation> statusDB = new HashMap<String, TestClass.Aggregation>(); //read from the file as status one by one public void process(Status status) { if(!statusDB.containsKey(status.oldStatus)) { Aggregation aggregation = new Aggregation(); statusDB.put(status.fullName, aggregation); } statusDB.get(status.fullName).add(status); }

获取具有多个状态的时间戳之间的平均值

2 个答案: