Question

我刚刚收到了我的应用程序的新数据源，该数据源仅在数据更改时才将数据插入Derby数据库。通常情况下，缺少数据很好 - 我正在绘制带有数据的折线图（随时间变化的值），我只是在两点之间画一条线，在任何给定点外推预期值。问题是，在这种情况下，由于缺少数据意味着“画一条直线”，如果我这样做，图表将是不正确的。

我有两种方法可以解决这个问题：我可以创建一个新的类，以不同的方式处理缺失的数据（由于prefuse的方式，我正在使用的绘图库，处理绘图可能很困难），或者我可以复制这些行，在更改每行中的 x 值时， y 值保持不变。我可以在桥接数据库和渲染器的Java中执行此操作，或者我可以修改SQL。

我的问题是，给出如下结果集：

+-------+---------------------+
| value | received            |
+-------+---------------------+
|     7 | 2000-01-01 08:00:00 |
|    10 | 2000-01-01 08:00:05 |
|    11 | 2000-01-01 08:00:07 |
|     2 | 2000-01-01 08:00:13 |
|     4 | 2000-01-01 08:00:16 |
+-------+---------------------+

假设我在8:00:20查询它，如何使用SQL使其看起来如下所示？基本上，我每隔一秒就重复一行，直到它已经被占用。出于所有意图和目的，received是唯一的（它不是，但它将归因于查询中的WHERE子句）。

+-------+---------------------+
| value | received            |
+-------+---------------------+
|     7 | 2000-01-01 08:00:00 |
|     7 | 2000-01-01 08:00:01 |
|     7 | 2000-01-01 08:00:02 |
|     7 | 2000-01-01 08:00:03 |
|     7 | 2000-01-01 08:00:04 |
|    10 | 2000-01-01 08:00:05 |
|    10 | 2000-01-01 08:00:06 |
|    11 | 2000-01-01 08:00:07 |
|    11 | 2000-01-01 08:00:08 |
|    11 | 2000-01-01 08:00:09 |
|    11 | 2000-01-01 08:00:10 |
|    11 | 2000-01-01 08:00:11 |
|    11 | 2000-01-01 08:00:12 |
|     2 | 2000-01-01 08:00:13 |
|     2 | 2000-01-01 08:00:14 |
|     2 | 2000-01-01 08:00:15 |
|     4 | 2000-01-01 08:00:16 |
|     4 | 2000-01-01 08:00:17 |
|     4 | 2000-01-01 08:00:18 |
|     4 | 2000-01-01 08:00:19 |
|     4 | 2000-01-01 08:00:20 |
+-------+---------------------+

感谢您的帮助。

Answer 1

由于SQL的基于集合的特性，没有简单的方法可以做到这一点。我使用了两种解决方案策略：

a）使用一个循环从初始日期到结束日期时间，并为每个步骤获取值，并将其插入临时表

b）以1分钟为增量生成一个表（正常或临时），将基准日期时间添加到此表中，您可以生成步骤。

方法示例b）（SQL Server版本）

假设我们永远不会查询超过24小时的数据。我们创建一个表 interval ，其中包含一个dttm字段，其中包含每个步骤的分钟数。该表必须先填充。

select dateadd(minute,stepMinutes,'2000-01-01 08:00') received,
(select top 1 value from table where received <= 
dateadd(minute,dttm,'2000-01-01 08:00') 
order by received desc) value
from intervals

Answer 2

在这种情况下，您似乎并不需要生成所有这些数据点。生成以下内容是否正确？如果它绘制一条直线，则不需要为每一秒生成一个数据点，每个数据点只需两个...一个在当前时间，一个在下一个时间之前。此示例从下一次减去5 ms，但如果需要，可以将其设为完整秒。

+-------+---------------------+
| value | received            |
+-------+---------------------+
|     7 | 2000-01-01 08:00:00 |
|     7 | 2000-01-01 08:00:04 |
|    10 | 2000-01-01 08:00:05 |
|    10 | 2000-01-01 08:00:06 |
|    11 | 2000-01-01 08:00:07 |
|    11 | 2000-01-01 08:00:12 |
|     2 | 2000-01-01 08:00:13 |
|     2 | 2000-01-01 08:00:15 |
|     4 | 2000-01-01 08:00:16 |
|     4 | 2000-01-01 08:00:20D |
+-------+---------------------+

如果是这种情况，那么您可以执行以下操作：

SELECT * FROM
(SELECT * from TimeTable as t1
UNION
SELECT t2.value, dateadd(ms, -5, t2.received)
from ( Select t3.value, (select top 1 t4.received  
                         from TimeTable t4 
                         where t4.received > t3.received
                         order by t4.received asc) as received
from TimeTable t3) as t2
UNION
SELECT top 1 t6.value, GETDATE()
from TimeTable t6
order by t6.received desc
) as t5
where received IS NOT NULL
order by t5.received

这样做的一大优势是它是一个基于集合的解决方案，并且比任何迭代方法都快得多。

Answer 3

你可以走一个光标，保留vars的最后一个值＆amp;返回时间，如果当前时间超过一秒，则使用前一个值和新时间一次循环一秒，直到获得当前行的时间。

尝试在SQL中执行此操作会很痛苦，如果您去创建缺少的数据，则可能需要添加一列来跟踪实际/内插数据点。

Answer 4

处理此问题的一种方法是将数据连接到包含所有已接收值的表。然后，当该行没有值时，您可以根据前一个和下一个实际值计算出预测值。

您没有说出您正在使用的数据库平台。在SQL Server中，我将创建一个用户定义函数，该函数接受开始日期时间和结束日期时间值。它将返回一个表值，其中包含您需要的所有接收的值。

我在下面模拟了它，它在SQL Server中运行。子选择别名 r 是用户定义函数实际返回的内容。

select r.received,
isnull(d.value,(select top 1 data.value from data where data.received < r.received order by data.received desc)) as x
from (
    select cast('2000-01-01 08:00:00' as datetime) received
    union all
    select cast('2000-01-01 08:00:01' as datetime)
    union all
    select cast('2000-01-01 08:00:02' as datetime)
    union all
    select cast('2000-01-01 08:00:03' as datetime)
    union all
    select cast('2000-01-01 08:00:04' as datetime)
    union all
    select cast('2000-01-01 08:00:05' as datetime)
    union all
    select cast('2000-01-01 08:00:06' as datetime)
    union all
    select cast('2000-01-01 08:00:07' as datetime)
    union all
    select cast('2000-01-01 08:00:08' as datetime)
    union all
    select cast('2000-01-01 08:00:09' as datetime)
    union all
    select cast('2000-01-01 08:00:10' as datetime)
    union all
    select cast('2000-01-01 08:00:11' as datetime)
    union all
    select cast('2000-01-01 08:00:12' as datetime)
    union all
    select cast('2000-01-01 08:00:13' as datetime)
    union all
    select cast('2000-01-01 08:00:14' as datetime)
    union all
    select cast('2000-01-01 08:00:15' as datetime)
    union all
    select cast('2000-01-01 08:00:16' as datetime)
    union all
    select cast('2000-01-01 08:00:17' as datetime)
    union all
    select cast('2000-01-01 08:00:18' as datetime)
    union all
    select cast('2000-01-01 08:00:19' as datetime)
    union all
    select cast('2000-01-01 08:00:20' as datetime)
) r
left outer join Data d on r.received = d.received

Answer 5

最好是为图表上的每个轴值设置一个表，然后连接到它，或者甚至只将数据字段放在那里，并在值到达时更新该记录。

“缺失值”问题相当广泛，所以我建议你有一个可靠的政策。

将要发生的一件事是，您将有多个相邻的插槽缺少值。

如果您可以将其转换为OLAP数据，这会更容易。

Answer 6

创建一个包含所有分钟的简单表（警告，将运行一段时间）：

Create Table Minutes(Value DateTime Not Null)
Go

Declare @D DateTime
Set @D = '1/1/2000'

While (Year(@D) < 2002)
Begin
  Insert Into Minutes(Value) Values(@D)
  Set @D = DateAdd(Minute, 1, @D)
End
Go


Create Clustered Index IX_Minutes On Minutes(Value)
Go

然后您可以使用它：

Select 
  Received = Minutes.Value,
  Value = (Select Top 1 Data.Value
           From Data
           Where Data.Received <= Minutes.Received
           Order By Data.Received Desc)
From
  Minutes
Where
  Minutes.Value Between @Start And @End

Answer 7

由于基于set的性质，我建议不要在SQL /数据库中解决这个问题。你也在这里处理秒数，所以我猜你可能会得到很多行，并且有相同的重复数据，这些行必须从数据库转移到你的应用程序。

Answer 8

如果你在SQL Server中，那么这将是一个良好的开端。我不确定Apache的Derby与sql有多接近。

Usage: EXEC ElaboratedData '2000-01-01 08:00:00','2000-01-01 08:00:20'

CREATE PROCEDURE [dbo].[ElaboratedData]
  @StartDate DATETIME,
  @EndDate DATETIME
AS
  --if not a valid interval, just quit
  IF @EndDate<=@StartDate BEGIN
    SELECT 0;    
    RETURN;
  END;

  /*
  Store the value of 1 second locally, for readability
  --*/
  DECLARE @OneSecond FLOAT;
  SET @OneSecond = (1.00000000/86400.00000000);

  /*
  create a temp table w/the same structure as the real table.
  --*/
  CREATE TABLE #SecondIntervals(TSTAMP DATETIME, DATAPT INT);

  /*
  For each second in the interval, check to see if we have a known value.
  If we do, then use that.  If not, make one up.
  --*/ 
  DECLARE @CurrentSecond DATETIME; 
  SET @CurrentSecond = @StartDate;
  WHILE @CurrentSecond <= @EndDate BEGIN
    DECLARE @KnownValue INT;

    SELECT @KnownValue=DATAPT
    FROM TESTME
    WHERE TSTAMP = @CurrentSecond;

    IF (0 = ISNULL(@KnownValue,0)) BEGIN
      --ok, we have to make up a fake value
      DECLARE @MadeUpValue INT;
      /*
      *******Put whatever logic you want to make up a fake value here
      --*/
      SET @MadeUpValue = 99;

      INSERT INTO #SecondIntervals(
        TSTAMP
       ,DATAPT
      )
      VALUES(
        @CurrentSecond
       ,@MadeUpValue
      );
    END;  --if we had to make up a value
    SET @CurrentSecond = @CurrentSecond + @OneSecond;
  END;  --while looking thru our values

  --finally, return our generated values + real values
  SELECT TSTAMP, DATAPT FROM #SecondIntervals
  UNION ALL
  SELECT TSTAMP, DATAPT FROM TESTME
  ORDER BY TSTAMP;
GO

Answer 9

作为一个想法，你可能想查看Anthony Mollinaro的SQL Cookbook，第9章。他有一个食谱，"Filling in Missing Dates"（查看第278-281页），主要讨论你要做的事情。它需要某种顺序处理，可以通过辅助表或递归执行查询。虽然他没有直接使用Derby的例子，但我怀疑你可能会根据你的问题调整它们（尤其是PostgreSQL或MySQL，它似乎与平台无关）。

是否可以在SQL SELECT查询中临时复制和修改行？

9 个答案: