我有一个具有以下结构的SQLite数据库:
rowid ID startTimestamp endTimestamp subject
1 00:50:c2:63:10:1a 1000 1090 entrance
2 00:50:c2:63:10:1a 1100 1270 entrance
3 00:50:c2:63:10:1a 1300 1310 door1
4 00:50:c2:63:10:1a 1370 1400 entrance
.
.
.
我在这里准备了一个sqlfiddle:http://sqlfiddle.com/#!2/fe8c6/2
使用这个SQL-Query,我可以得到endTime和一行与下一行之间的startTime之间的平均差异,按主题和ID排序:
SELECT
id,
( MAX(endtimestamp) - MIN(startTimestamp)
- SUM(endtimestamp-startTimestamp)
) / (COUNT(*)-1) AS averageDifference
FROM
table1
WHERE ID = '00:50:c2:63:10:1a'
AND subject = 'entrance'
GROUP BY id;
我的问题:计算平均值没问题,这就是查询。但我怎么能 得到这个值的标准差和方差?
答案 0 :(得分:3)
对于比简单求和更复杂的公式,您必须通过查找相应的下一个开始时间来计算每个记录的实际差值,如下所示:
SELECT (SELECT MIN(startTimestamp)
FROM table1 AS next
WHERE next.startTimestamp > table1.startTimestamp
AND ID = '...'
) - endTimestamp AS timeDifference
FROM table1
WHERE nextStartTimestamp IS NOT NULL
AND ID = '...'
然后您可以使用所有差值进行计算:
SELECT SUM(timeDifference) / COUNT(*) AS average,
AVG(timeDifference) AS moreEfficientAverage,
SUM(timeDifference * timeDifference) / COUNT(*) -
AVG(timeDifference) * AVG(timeDifference) AS variance
FROM (SELECT (SELECT MIN(startTimestamp)
FROM table1 AS next
WHERE next.startTimestamp > table1.startTimestamp
AND next.ID = '...'
) - endTimestamp AS timeDifference
FROM table1
WHERE nextStartTimestamp IS NOT NULL
AND ID = '...')
答案 1 :(得分:3)
首先通过将表格连接到自身并按ID分组来找到感兴趣的时间差异,然后找出平均值,方差为V(x) = E(x^2) - (E(x))^2
,标准差为sqrt(V)
给出
SELECT ID, AVG(diff) AS average,
AVG(diff*diff) - AVG(diff)*AVG(diff) AS variance,
SQRT(AVG(diff*diff) - AVG(diff)*AVG(diff)) AS stdev
FROM
(SELECT t1.id, t1.endTimestamp,
min(t2.startTimeStamp) - t1.endTimestamp AS diff
FROM table1 t1
INNER JOIN table1 t2
ON t2.ID = t1.ID AND t2.subject = t1.subject
AND t2.startTimestamp > t1.startTimestamp -- consider only later startTimestamps
WHERE t1.subject = 'entrance'
GROUP BY t1.id, t1.endTimestamp) AS diffs
GROUP BY ID
答案 2 :(得分:1)
一些观点:
SUM(endtimestamp-starttimestamp)/COUNT(endtimestamp)
。我不知道为什么你有MIN/MAX
条款。 COUNT(*)
将计算NULL
行,并会给出错误的结果。avg
函数可以找到它的意思。SUM((endtimestamp-starttimestamp)*(endtimestamp-starttimestamp)) - AVG(endtimestamp-starttimestamp)*AVG(endtimestamp-starttimestamp)
在回答作者评论的问题时,为了计算方差,开始和结束时间必须通过自联接相互配对。
由于SQL lite中缺少row_number函数,这有点不合适。
SELECT id,
AVG(startTimestamp-endTimestamp) as mean,
SUM((startTimestamp-endTimestamp)^2) - AVG(startTimestamp-endTimestamp)^2 as variance,
SQRT(SUM((startTimestamp-endTimestamp)^2) - AVG(startTimestamp-endTimestamp)^2) as stDev
FROM
(SELECT
t1.id,
t1.endTimestamp,
MIN(t2.startTimestamp) as starttimestamp
FROM table1 t1
INNER JOIN
table1 t2 ON t1.endTimestamp<=t2.startTimestamp
GROUP BY t1.id, t1.endTimestamp) t
GROUP BY id;
请参阅SQL Fiddle