计算每天(组)每个变量的线性回归的斜率

时间:2018-03-11 13:18:17

标签: r linear-regression

示例数据:

数据集有四列:TimeVar1Var2Var3Time柱粒为1分钟,但应该每天进行回归。

Time <- format(seq(as.POSIXct("2018-02-01 23:12:00"), as.POSIXct("2018-02-25 08:32:00"), by="min"), tz = "EST")
df <- data.frame(Time, Var1=runif(length(Time)), Var2=runif(length(Time)), Var3=runif(length(Time)))

问题:

如何为每天的每个变量运行线性回归?输出是每天Var1Var2Var3的斜率。

紧密解决方案:

我能得到的一个近距离解决方案来自post。但是,来自TTR套餐的ROC不是&#34;斜率&#34;基于线性回归分析。

此任务的任何想法 - 计算每天每个变量的斜率?

我的解决方案:

df$Time <- as.Date(df$Time) 
df$year <- format(df$Time,format="%Y") 
df$mth <- format(df$Time,format="%m") 
df$day <- format(df$Time,format="%d") 
aggregate( df$Var1 ~ year + mth + day , df , SLOPE_FUNCTION ) 
aggregate( df$Var2 ~ year + mth + day , df , SLOPE_FUNCTION ) 
aggregate( df$Var3 ~ year + mth + day , df , SLOPE_FUNCTION ) 

您能否告诉我如何根据lm创建SLOPE_FUNCTION以产生斜率结果以及如何在一个行代码中将聚合应用于每一列(即Var1,Var2和Var3)?

2 个答案:

答案 0 :(得分:0)

如果您只是进行TimeTime次更改,则可以执行以下操作:

library(tidyverse)
as_data_frame(df) %>%
  mutate_if(is.numeric, funs(. / lag(.)))

# # A tibble: 33,681 x 4
#    Time                  Var1   Var2    Var3
#    <fct>                <dbl>  <dbl>   <dbl>
#  1 2018-02-01 18:12:00 NA     NA     NA     
#  2 2018-02-01 18:13:00  1.06   1.17   0.433 
#  3 2018-02-01 18:14:00  0.551  0.647  2.41  
#  4 2018-02-01 18:15:00  3.12   1.34   0.134 
#  5 2018-02-01 18:16:00  1.43   0.344  6.43  
#  6 2018-02-01 18:17:00  0.189  0.790  0.823 
#  7 2018-02-01 18:18:00  0.355  3.39   1.51  
#  8 2018-02-01 18:19:00  3.62   0.604  1.17  
#  9 2018-02-01 18:20:00  0.950  0.505  0.0213
# 10 2018-02-01 18:21:00  3.86   2.34  19.5   
# # ... with 33,671 more rows

如果您想要更改百分比,可以将-1添加到funs()参数中:

as_data_frame(df) %>%
  mutate_if(is.numeric, funs(. / lag(.) - 1))

<小时/> 对于lm天,按变量,我会使用purrrbroom

library(tidyverse)
library(lubridate)

as_data_frame(df) %>%
  mutate(Time = ymd_hms(Time)) %>%
  mutate(day = floor_date(Time, unit = "day")) %>%
  gather(variable, value, -day, -Time) %>%
  nest(-day, -variable) %>%
  mutate(model = map(data, ~lm(as.numeric(Time) ~ value, data = .))) %>%
  unnest(model %>% map(broom::tidy))

# # A tibble: 150 x 7
#    day                 variable term           estimate std.error    statistic p.value
#    <dttm>              <chr>    <chr>             <dbl>     <dbl>        <dbl>   <dbl>
#  1 2018-02-01 00:00:00 Var1     (Intercept)  1517518845       618  2457337      0     
#  2 2018-02-01 00:00:00 Var1     value               592      1091        0.543  0.588 
#  3 2018-02-02 00:00:00 Var1     (Intercept)  1517571312      1337  1134724      0     
#  4 2018-02-02 00:00:00 Var1     value              2902      2318        1.25   0.211 
#  5 2018-02-03 00:00:00 Var1     (Intercept)  1517661220      1369  1108633      0     
#  6 2018-02-03 00:00:00 Var1     value       -      3981      2333 -      1.71   0.0881
#  7 2018-02-04 00:00:00 Var1     (Intercept)  1517744983      1318  1151672      0     
#  8 2018-02-04 00:00:00 Var1     value              1170      2275        0.514  0.607 
#  9 2018-02-05 00:00:00 Var1     (Intercept)  1517833026      1369  1109079      0     
# 10 2018-02-05 00:00:00 Var1     value       -      2027      2303 -      0.880  0.379 
# # ... with 140 more rows

如果您非常喜欢斜坡,可以将%>% filter(term == "value")添加到管道中。

<小时/> 最后,您可能更愿意将这些数据可视化。您可以使用geom_smooth() method = "lm"来放弃模型构建 - 请参阅下文。 注意:我过滤到几天,因为情节很快就会忙碌。

as_data_frame(df) %>%
  mutate(Time = ymd_hms(Time)) %>%
  mutate(day = floor_date(Time, unit = "day")) %>%
  filter(day <= ymd("2018-02-05")) %>%
  gather(variable, value, -day, -Time) %>%
  ggplot(., aes(x = Time, y = value, color = factor(day))) +
  geom_point(alpha = 0.1) +
  geom_smooth(method = "lm", se = FALSE) +
  facet_wrap(~ variable)

R Plot

另外,如果你利用interactiongroup,你可以根据你在解释之后所做的事情来略微改变一些事情:

as_data_frame(df) %>%
  mutate(Time = ymd_hms(Time)) %>%
  mutate(day = floor_date(Time, unit = "day")) %>%
  filter(day <= ymd("2018-02-05")) %>%
  gather(variable, value, -day, -Time) %>%
  ggplot(., aes(x = Time, y = value, color = variable, 
                group = interaction(variable, factor(day)))) +
  geom_point(alpha = 0.1) +
  geom_smooth(method = "lm", se = FALSE) 

Another plot

答案 1 :(得分:0)

正确整理数据后,您可以使用nlme::lmList执行此操作。

library(tidyverse)
library(lubridate)
df2 <- df %>%
  ## reshape data to get Time repeated for each variable
  gather(var,value,-Time) %>%
  mutate(Time=ymd_hms(Time),   ## convert to date-time variable
         date=date(Time),      ## date info only
         timeval=Time-floor_date(Time,"day"),  ## time since beginning of day
         datevar=interaction(date,var))        ## date/var combo

现在您可以同时适应所有日期/变量组合:

nlme::lmList(value~timeval|datevar,df2)