Question

我有时间在SQLite中以“2012-02-21 00：00：00.000000”的形式出现，并希望将平均时间放在一起。日期并不重要 - 只是时间。因此，例如，如果数据是：

'2012-02-18 20:00:00.000000' 
'2012-02-19 21:00:00.000000' 
'2012-02-20 22:00:00.000000' 
'2012-02-21 23:00:00.000000'

平均分别为20岁，21岁，22岁，23岁，应为21.5岁，或21:30岁（或美国时间晚上9:30）。

Q1）在SQLite的SELECT查询中有最好的方法吗？

但更难：如果一个或多个日期时间跨越午夜怎么办？他们肯定会在我的数据集中。例如：

'2012-02-18 22:00:00.000000'
'2012-02-19 23:00:00.000000' 
'2012-02-21 01:00:00.000000'

现在平均值应该是（22 + 23 + 1）/ 3 = 15.33或15:20（下午3:20）。但这会歪曲数据，因为这些事件都发生在晚上，从22:00到01:00（晚上10点到凌晨1点）。实际上，更好的方法是将它们平均为（22 + 23 + 25 ）/ 3 = 23.33或23:20（晚上11:20）。 average of times illustration

Q2）我的SELECT查询是否应该考虑到这一点，或者我需要用Python编写代码？

Answer 1

你真正想要计算什么？

日期时间（或1天内的时间）通常表示为实数
24小时制的时间坐标是复数，但是
时间的实数表示的平均值将给你可疑的结果......

我不知道你想对[1:00，13:00]这样的边缘情况做什么，但让我们考虑一下示例：[01:30 ，06:30，13:20，15:30，16:15，16:45，17:10] enter image description here

我建议在 Python ：

中实施此算法

将时间转换为复数 - 例如在半径= 1
使用向量加法计算平均值
将结果矢量角度转换为分钟+计算此结果的相关性（例如[1:00，13:00]的平均值的相关性应为0，无论因为舍入而计算角度误差）

import math
def complex_average(minutes):
    # first convert the times from minutes (0:00 - 23:59) to radians
    # so we get list for quasi polar coordinates (1, radians)
    # (no point in rotating/flipping to get real polar coordinates)
    # 180° = 1/2 day = 24*60/2 minutes
    radians = [t*math.pi/(24*60/2) for t in minutes]
    xs = []
    ys = []
    for r in radians:
        # convert polar coordinates (1, r) to cartesian (x, y)
        # the vectors start at (0, 0) and end in (x, y)
        x, y = (math.cos(r), math.sin(r))
        xs.append(x)
        ys.append(y)

    # result vector = vector addition
    sum_x, sum_y = (sum(ys), sum(xs))

    # convert result vector coordinates to radians, then to minutes
    # note the cumulative ROUNDING ERRORS, however
    result_radians = math.atan2(sum_x, sum_y)
    result_minutes = int(result_radians / math.pi * (24*60/2))
    if result_minutes < 0:
        result_minutes += 24*60

    # relevance = magnitude of the result vector / number of data points
    # (<0.0001 means that all vectors cancel each other, e.g. [1:00, 13:00]
    #  => result_minutes would be random due to rounding error)
    # FYI: standart_deviation = 6*60 - 6*60*relevance
    relevance = round(math.sqrt(sum_x**2 + sum_y**2) / len(minutes), 4)

    return result_minutes, relevance

并按照以下方式进行测试：

# let's say the select returned a bunch of integers in minutes representing times
selected_times = [90, 390, 800, 930, 975, 1005, 1030]
# or create other test data:
#selected_times = [hour*60 for hour in [23,22,1]]

complex_avg_minutes, relevance = complex_average(selected_times)
print("complex_avg_minutes = {:02}:{:02}".format(complex_avg_minutes//60,
                                                 complex_avg_minutes%60),
      "(relevance = {}%)".format(int(round(relevance*100))))

simple_avg = int(sum(selected_times) / len(selected_times))
print("simple_avg = {:02}:{:02}".format(simple_avg//60,
                                        simple_avg%60))

hh_mm = ["{:02}:{:02}".format(t//60, t%60) for t in selected_times]
print("\ntimes = {}".format(hh_mm))

我的例子的输出：

complex_avg_minutes = 15:45 (relevance = 44%)
simple_avg = 12:25

Answer 2

我不确定你的平均日期。

我要做的是获得行值和固定日期之间的小时差异的平均值，然后将该平均值添加到固定日期。使用分钟可能会导致int溢出并需要进行某种类型转换

有点......

select dateadd(hh,avg(datediff(hh,getdate(),myrow)),getdate()) 
from mytable;

Answer 3

如果我理解正确，你想得到午夜时间的平均距离吗？

这个怎么样？

SELECT SUM(mins) / COUNT(*) from
( SELECT
    CASE 
    WHEN strftime('%H', t) * 1 BETWEEN 0 AND 11 
    THEN (strftime('%H', t)) * 60 + strftime('%M', t)
    ELSE strftime('%H', t) * 60 + strftime('%M', t) - 24 * 60
    END mins
  FROM timestamps
);

所以我们计算从午夜开始的分钟偏差：中午之后我们得到一个负值，中午是正数。第一行平均它们并在几分钟内给出结果。将其转换回hh:mm时间留作“学生锻炼”; - ）

Answer 4

网站Rosetta代码有一个task和关于此主题的代码，在研究我遇到此维基百科link时。查看讨论/讨论页面，了解有关适用性的讨论等。

从日期时间获取SQLite中的平均时间

4 个答案: