Question

我有一个H2数据库。我想计算一下我提供的数据的平均燃料使用量。问题是，我得到的是非常混乱。这是1辆汽车的燃料使用数据。

这是一些示例数据：

| Amount   | Date       | Start (km) | End (km) |
+----------+------------+------------+----------+
| 35.5     | 2012-02-02 | 65000      | null     |
| 36.7     | 2012-02-15 | null       | 66520    |
| 44.5     | 2012-02-18 | null       | null     |
| 33.8     | 2012-02-22 | 67000      | null     |
| 44.5     | 2013-01-22 | null       | null     |

首先计算平均燃料使用量我计算MIN（距离）和MAX（距离）之间的差值，我有以下查询：

SELECT 
   CASEWHEN((MAX(start)-MAX(end))>0, MAX(start), MAX(end)) 
    - 
   IFNULL(MIN(start),0) 
FROM fuel;

对于下一步，我需要SUM(Amount)，但我怎么能这样做只能将67000到65000之间的行相加？

非常感谢任何帮助。

Answer 1

我会像这样接近它：

SELECT SUM([amount]) / SUM([end] - [start]) AS AverageFuelUsage
FROM [fuel]
WHERE [amount] IS NOT NULL
AND [start] IS NOT NULL
AND [end] IS NOT NULL

注意：这排除了大量数据（在您的样本数据中，所有数据） - 但这很重要。

如果您不知道旅程中使用的燃油量，这并不意味着没有使用燃油，因此默认为0是一个坏主意;最好忽略这一行并依赖完整的数据。
如果您不知道开始或结束阅读，您不知道距离;再次你不能假设0所以忽略这个不好的数据。

如果所有记录中至少缺少一个字段，您可以使用下面的代码 - 但如果您的1％的记录中有完整的数据可以使用，我就不会设计它。 / p>

SELECT AVG([amount]) / ( AVG([end]) - AVG([start]) ) AS AverageFuelUsage
FROM [fuel]

这里的想法是，如果我们假设在大数据集上数据平均值（即大多数人行进相似的距离，开始和结束读数也倾向于某些平均值），我们可以计算每个数据的平均值。我不是一名统计学家，会对这一结果给予很多怀疑，但如果你只有糟糕的数据可以使用并且需要一个结果，那么你可能会得到最好的结果。

<强>更新

根据评论中的讨论，如果您记录了每个旅程并且所有读数都是针对同一车辆的，您可以找到带有[start]的第一个值，带有[end]的最后一个值，计算出的总行程距离所有这些旅程，然后汇总所有在途中使用的燃料。

--ideally date is unique
--if not this tries to work out the sequence of journeys based on start/end odometer readings
--if they're both null and fall on the same day as the final [end] reading, assumes the null reading journey was prior to the [end] one
declare @fuel table ([amount] float, [date] date, [start] int, [end] int)
insert @fuel
  values ( 35.5     , '2012-02-02' , 65000      , null     )
        ,( 36.7     , '2012-02-15' , null       , 66520    )
        ,( 44.5     , '2012-02-18' , null       , null     )
        ,( 33.8     , '2012-02-22' , 67000      , null     )
        ,( 44.5     , '2013-01-22' , null       , null     )

select j1.[start]
, jn.[end]
, sum(f.[amount]) [amount]
, sum(f.[amount]) / (jn.[end] - j1.[start]) LitresPerKm
, (jn.[end] - j1.[start]) / sum(f.[amount])  kmsPerLitre

from
(
    select top 1 [amount], [date], [start], [end]
    from @fuel
    where [start] is not null
    order by [start]
) j1 --first journey
cross join
(   
    select top 1 [amount], [date], [start], [end]
    from @fuel
    where [end] is not null
    order by [end] desc
) jn --last journey
inner join @fuel f
on f.[date] >= j1.[date]
and (f.[end] <= j1.[start] or f.[end] is null) --in case multiple journeys on the same day & this is before our first start
and f.[date] <= jn.[date] 
and (f.start <= jn.[end] or f.[start] is null) --in case multiple journeys on the same day & this is after our last end
group by j1.[start],jn.[end]

选择2个已定义行之间的数据

1 个答案: