我有时间在SQLite中以“2012-02-21 00:00:00.000000”的形式出现,并希望将平均时间放在一起。日期并不重要 - 只是时间。因此,例如,如果数据是:
'2012-02-18 20:00:00.000000'
'2012-02-19 21:00:00.000000'
'2012-02-20 22:00:00.000000'
'2012-02-21 23:00:00.000000'
平均分别为20岁,21岁,22岁,23岁,应为21.5岁,或21:30岁(或美国时间晚上9:30)。
Q1)在SQLite的SELECT查询中有最好的方法吗?
但更难:如果一个或多个日期时间跨越午夜怎么办?他们肯定会在我的数据集中。例如:
'2012-02-18 22:00:00.000000'
'2012-02-19 23:00:00.000000'
'2012-02-21 01:00:00.000000'
现在平均值应该是(22 + 23 + 1)/ 3 = 15.33或15:20(下午3:20)。但这会歪曲数据,因为这些事件都发生在晚上,从22:00到01:00(晚上10点到凌晨1点)。实际上,更好的方法是将它们平均为(22 + 23 + 25 )/ 3 = 23.33或23:20(晚上11:20)。
Q2)我的SELECT查询是否应该考虑到这一点,或者我需要用Python编写代码?
答案 0 :(得分:2)
你真正想要计算什么?
我不知道你想对[1:00
,13:00
]这样的边缘情况做什么,但让我们考虑一下示例:[01:30
,06:30
,13:20
,15:30
,16:15
,16:45
,17:10
]
我建议在 Python :
中实施此算法1:00
,13:00
]的平均值的相关性应为0,无论因为舍入而计算角度误差) import math
def complex_average(minutes):
# first convert the times from minutes (0:00 - 23:59) to radians
# so we get list for quasi polar coordinates (1, radians)
# (no point in rotating/flipping to get real polar coordinates)
# 180° = 1/2 day = 24*60/2 minutes
radians = [t*math.pi/(24*60/2) for t in minutes]
xs = []
ys = []
for r in radians:
# convert polar coordinates (1, r) to cartesian (x, y)
# the vectors start at (0, 0) and end in (x, y)
x, y = (math.cos(r), math.sin(r))
xs.append(x)
ys.append(y)
# result vector = vector addition
sum_x, sum_y = (sum(ys), sum(xs))
# convert result vector coordinates to radians, then to minutes
# note the cumulative ROUNDING ERRORS, however
result_radians = math.atan2(sum_x, sum_y)
result_minutes = int(result_radians / math.pi * (24*60/2))
if result_minutes < 0:
result_minutes += 24*60
# relevance = magnitude of the result vector / number of data points
# (<0.0001 means that all vectors cancel each other, e.g. [1:00, 13:00]
# => result_minutes would be random due to rounding error)
# FYI: standart_deviation = 6*60 - 6*60*relevance
relevance = round(math.sqrt(sum_x**2 + sum_y**2) / len(minutes), 4)
return result_minutes, relevance
并按照以下方式进行测试:
# let's say the select returned a bunch of integers in minutes representing times
selected_times = [90, 390, 800, 930, 975, 1005, 1030]
# or create other test data:
#selected_times = [hour*60 for hour in [23,22,1]]
complex_avg_minutes, relevance = complex_average(selected_times)
print("complex_avg_minutes = {:02}:{:02}".format(complex_avg_minutes//60,
complex_avg_minutes%60),
"(relevance = {}%)".format(int(round(relevance*100))))
simple_avg = int(sum(selected_times) / len(selected_times))
print("simple_avg = {:02}:{:02}".format(simple_avg//60,
simple_avg%60))
hh_mm = ["{:02}:{:02}".format(t//60, t%60) for t in selected_times]
print("\ntimes = {}".format(hh_mm))
我的例子的输出:
complex_avg_minutes = 15:45 (relevance = 44%)
simple_avg = 12:25
答案 1 :(得分:0)
我不确定你的平均日期。
我要做的是获得行值和固定日期之间的小时差异的平均值,然后将该平均值添加到固定日期。使用分钟可能会导致int溢出并需要进行某种类型转换
有点......
select dateadd(hh,avg(datediff(hh,getdate(),myrow)),getdate())
from mytable;
答案 2 :(得分:0)
如果我理解正确,你想得到午夜时间的平均距离吗?
这个怎么样?
SELECT SUM(mins) / COUNT(*) from
( SELECT
CASE
WHEN strftime('%H', t) * 1 BETWEEN 0 AND 11
THEN (strftime('%H', t)) * 60 + strftime('%M', t)
ELSE strftime('%H', t) * 60 + strftime('%M', t) - 24 * 60
END mins
FROM timestamps
);
所以我们计算从午夜开始的分钟偏差:中午之后我们得到一个负值,中午是正数。第一行平均它们并在几分钟内给出结果。将其转换回hh:mm
时间留作“学生锻炼”; - )
答案 3 :(得分:0)