使用SQL查询识别趋势

时间:2014-01-02 12:39:17

标签: sql trend

我有一个表(我们称之为数据),带有一组对象ID,数值和日期。我想确定其值在过去X分钟(例如,一小时)内具有正趋势的对象。

示例数据:

entity_id | value | date

1234      | 15    | 2014-01-02 11:30:00

5689      | 21    | 2014-01-02 11:31:00

1234      | 16    | 2014-01-02 11:31:00

我试着查看类似的问题,但不幸的是找不到任何有用的东西......

2 个答案:

答案 0 :(得分:29)

您启发我在SQL Server中实现线性回归。这可以针对MySQL / Oracle / Whatever进行修改而不会有太多麻烦。这是确定每个entity_id的小时趋势的数学上最好的方法,它将只选择具有正趋势的趋势。

它实现了计算此处列出的B1的公式:https://en.wikipedia.org/wiki/Regression_analysis#Linear_regression

create table #temp
(
    entity_id int,
    value int,
    [date] datetime
)

insert into #temp (entity_id, value, [date])
values
(1,10,'20140102 07:00:00 AM'),
(1,20,'20140102 07:15:00 AM'),
(1,30,'20140102 07:30:00 AM'),
(2,50,'20140102 07:00:00 AM'),
(2,20,'20140102 07:47:00 AM'),
(3,40,'20140102 07:00:00 AM'),
(3,40,'20140102 07:52:00 AM')

select entity_id, 1.0*sum((x-xbar)*(y-ybar))/sum((x-xbar)*(x-xbar)) as Beta
from
(
    select entity_id,
        avg(value) over(partition by entity_id) as ybar,
        value as y,
        avg(datediff(second,'20140102 07:00:00 AM',[date])) over(partition by entity_id) as xbar,
        datediff(second,'20140102 07:00:00 AM',[date]) as x
    from #temp
    where [date]>='20140102 07:00:00 AM' and [date]<'20140102 08:00:00 AM'
) as Calcs
group by entity_id
having 1.0*sum((x-xbar)*(y-ybar))/sum((x-xbar)*(x-xbar))>0

答案 1 :(得分:1)

如果有人在 Mysql 中需要这个,这是对我有用的代码。

datapoint | plays | status_time 

1234      | 15    | 2014-01-02 11:30:00

5689      | 21    | 2014-01-02 11:31:00

1234      | 16    | 2014-01-02 11:31:00

select datapoint, 1.0*sum((x-xbar)*(y-ybar))/sum((x-xbar)*(x-xbar)) as Beta
from
(
     select datapoint,
        avg(plays) over(partition by datapoint) as ybar,
        plays as y,
        avg(TIME_TO_SEC(TIMEDIFF('2021-03-22 21:00:00', status_time))) over(partition by datapoint) as xbar,
        TIME_TO_SEC(TIMEDIFF('2021-03-22 21:00:00', status_time)) as x
    from aggregate_datapoints
    where status_time BETWEEN'2021-03-22 21:00:00' and '2021-03-22 22:00:00'
and type = 'topContent') as calcs
group by datapoint
having 1.0*sum((x-xbar)*(y-ybar))/sum((x-xbar)*(x-xbar))>0