dplyR的滞后平均值

时间:2019-07-15 11:02:29

标签: dplyr

我对此主题Calculate mean by group using dplyr package还有一个疑问:

假设您想做同样的事情,但是现在创建一个变量,该变量是两次延迟的平均值。

在这里如何使用均值函数?

Df <- Df %>%
group_by(id) %>%
mutate(seasonal = mean(lag(x,12), lag(x,24), lag(x,36), lag(x,48), lag(x,60), lag(x,72))) %>%
mutate (seasonalreversal = mean(lag(x,1),lag(x,2)....,lag(x,11),lag(x,13)...

我正在尝试创建一个模式变量,该变量采用过去20年中同一月份的平均值(滞后12,24 ... 240),而其他月份的平均值均滞后(1-11、13-23) ,25-35 ... 229-239)它必须使用至少5年的数据,所以我创建了一个变量“ datem”,在两次观察之间间隔一个。

请注意,我也想保留变量,因此我也尝试了汇总,但是由于均值函数中不可能包含多个变量,因此两者都返回错误。

这是我希望系列之一的外观。首先是在获得5年数据之后进行计算。 <-至少需要5年才能启动。它最多可以使用20年的过去数据。如果有21年的数据,则该系列将不使用第一年的数据。

date    seriesid    totret  seasonal    seasonal reversal
1825-06-15  73717   0   NA  NA
1825-07-15  73717   -0.004149341    NA  NA
1825-08-15  73717   0.004166667 NA  NA
1825-09-15  73717   0.004149378 NA  NA
1825-10-15  73717   0.010330579 NA  NA
1825-11-15  73717   -0.00204499 NA  NA
1825-12-15  73717   0.03140648  NA  NA
1826-01-15  73717   -0.026476578    NA  NA
1826-02-15  73717   -0.005230126    NA  NA
1826-03-15  73717   -0.001051525    NA  NA
1826-04-15  73717   0.018947368 NA  NA
1826-05-15  73717   0.008264463 NA  NA
1826-06-15  73717   0.024590164 NA  NA
1826-07-15  73717   0.004098341 NA  NA
1826-08-15  73717   0.004081633 NA  NA
1826-09-15  73717   -0.00203252 NA  NA
1826-10-15  73717   0   NA  NA
1826-11-15  73717   0.00203666  NA  NA
1826-12-15  73717   0.016260163 NA  NA
1827-01-15  73717   -0.008196695    NA  NA
1827-02-15  73717   0   NA  NA
1827-03-15  73717   0   NA  NA
1827-04-15  73717   0.002066116 NA  NA
1827-05-15  73717   0   NA  NA
1827-06-15  73717   0.027077923 NA  NA
1827-07-15  73717   -0.024489796    NA  NA
1827-08-15  73717   0.008368201 NA  NA
1827-09-15  73717   0.010373444 NA  NA
1827-10-15  73717   0   NA  NA
1827-11-15  73717   0.006160164 NA  NA
1827-12-15  73717   0.008163265 NA  NA
1828-01-15  73717   -0.024691337    NA  NA
1828-02-15  73717   0   NA  NA
1828-03-15  73717   -0.016877637    NA  NA
1828-04-15  73717   -0.006437768    NA  NA
1828-05-15  73717   -0.010799136    NA  NA
1828-06-15  73717   0.008733624 NA  NA
1828-07-15  73717   0.00663719  NA  NA
1828-08-15  73717   -0.017582418    NA  NA
1828-09-15  73717   -0.015659955    NA  NA
1828-10-15  73717   0.004545455 NA  NA
1828-11-15  73717   0   NA  NA
1828-12-15  73717   0   NA  NA
1829-01-15  73717   -2.11114E-08    NA  NA
1829-02-15  73717   -0.023148148    NA  NA
1829-03-15  73717   -0.009478673    NA  NA
1829-04-15  73717   -0.007177033    NA  NA
1829-05-15  73717   -0.009638554    NA  NA
1829-06-15  73717   -0.02189781 NA  NA
1829-07-15  73717   0.025510196 NA  NA
1829-08-15  73717   0.02238806  NA  NA
1829-09-15  73717   0   NA  NA
1829-10-15  73717   0   NA  NA
1829-11-15  73717   0   NA  NA
1829-12-15  73717   0.01946472  NA  NA
1830-01-15  73717   -2.79358E-08    NA  NA
1830-02-15  73717   0   NA  NA
1830-03-15  73717   0.004889976 NA  NA
1830-04-15  73717   0.0243309   NA  NA
1830-05-15  73717   0.014251781 NA  NA
1830-06-15  73717   0.007025761 0.00770078  0.000831435
1830-07-15  73717   0.030952369 0.001521318 0.001493786
1830-08-15  73717   -0.002309469    0.004284428 0.001768225
1830-09-15  73717   0.00462963  -0.000633931    0.002121916

这是20多年的数据:

date    seriesid    totret
1825-06-15  1   0
1825-07-15  1   -0.004149341
1825-08-15  1   0.004166667
1825-09-15  1   0.004149378
1825-10-15  1   0.010330579
1825-11-15  1   -0.00204499
1825-12-15  1   0.03140648
1826-01-15  1   -0.026476578
1826-02-15  1   -0.005230126
1826-03-15  1   -0.001051525
1826-04-15  1   0.018947368
1826-05-15  1   0.008264463
1826-06-15  1   0.024590164
1826-07-15  1   0.004098341
1826-08-15  1   0.004081633
1826-09-15  1   -0.00203252
1826-10-15  1   0
1826-11-15  1   0.00203666
1826-12-15  1   0.016260163
1827-01-15  1   -0.008196695
1827-02-15  1   0
1827-03-15  1   0
1827-04-15  1   0.002066116
1827-05-15  1   0
1827-06-15  1   0.027077923
1827-07-15  1   -0.024489796
1827-08-15  1   0.008368201
1827-09-15  1   0.010373444
1827-10-15  1   0
1827-11-15  1   0.006160164
1827-12-15  1   0.008163265
1828-01-15  1   -0.024691337
1828-02-15  1   0
1828-03-15  1   -0.016877637
1828-04-15  1   -0.006437768
1828-05-15  1   -0.010799136
1828-06-15  1   0.008733624
1828-07-15  1   0.00663719
1828-08-15  1   -0.017582418
1828-09-15  1   -0.015659955
1828-10-15  1   0.004545455
1828-11-15  1   0
1828-12-15  1   0
1829-01-15  1   -2.11114E-08
1829-02-15  1   -0.023148148
1829-03-15  1   -0.009478673
1829-04-15  1   -0.007177033
1829-05-15  1   -0.009638554
1829-06-15  1   -0.02189781
1829-07-15  1   0.025510196
1829-08-15  1   0.02238806
1829-09-15  1   0
1829-10-15  1   0
1829-11-15  1   0
1829-12-15  1   0.01946472
1830-01-15  1   -2.79358E-08
1830-02-15  1   0
1830-03-15  1   0.004889976
1830-04-15  1   0.0243309
1830-05-15  1   0.014251781
1830-06-15  1   0.007025761
1830-07-15  1   0.030952369
1830-08-15  1   -0.002309469
1830-09-15  1   0.00462963
1830-10-15  1   0.00921659
1830-11-15  1   0.00456621
1830-12-15  1   0.002272727
1831-01-15  1   0.009280759
1831-02-15  1   0.001149425
1831-03-15  1   0.003444317
1831-04-15  1   0.013729977
1831-05-15  1   0.0248307
1831-06-15  1   0.017621145
1831-07-15  1   0.022123885
1831-08-15  1   0.008658009
1831-09-15  1   0.006437768
1831-10-15  1   0.004264392
1831-11-15  1   0
1831-12-15  1   0.021691929
1832-01-15  1   -0.031847134
1832-02-15  1   -0.004385965
1832-03-15  1   -0.015418502
1832-04-15  1   -0.011185682
1832-05-15  1   0.012443439
1832-06-15  1   0.046618922
1832-07-15  1   -0.024017467
1832-08-15  1   0
1832-09-15  1   0
1832-10-15  1   0
1832-11-15  1   0.017897092
1832-12-15  1   0.041758242
1833-01-15  1   -0.034482805
1833-02-15  1   0.002232143
1833-03-15  1   0.002227171
1833-04-15  1   0.008888889
1833-05-15  1   0.017621145
1833-06-15  1   0.028138528
1833-07-15  1   0.00755941
1833-08-15  1   0
1833-09-15  1   0
1833-10-15  1   0.024651661
1833-11-15  1   -0.031380753
1833-12-15  1   0.028077754
1834-01-15  1   -0.053879337
1834-02-15  1   -0.009111617
1834-03-15  1   -0.034482759
1834-04-15  1   0.05952381
1834-05-15  1   -0.015730337
1834-06-15  1   -0.00456621
1834-07-15  1   0.021226443
1834-08-15  1   0.008660508
1834-09-15  1   0.00057241
1834-10-15  1   0.004576659
1834-11-15  1   0.006833713
1834-12-15  1   -0.001131222
1835-01-15  1   0.018626295
1835-02-15  1   0
1835-03-15  1   0.009142857
1835-04-15  1   0.009060023
1835-05-15  1   0.03030303
1835-06-15  1   0.005446623
1835-07-15  1   0.045810087
1835-08-15  1   -0.008012821
1835-09-15  1   -0.01992461
1835-10-15  1   -0.004395604
1835-11-15  1   0.002207506
1835-12-15  1   0
1836-01-15  1   0.013636388
1836-02-15  1   0.025784753
1836-03-15  1   -0.007650273
1836-04-15  1   0.007709251
1836-05-15  1   0.007650273
1836-06-15  1   0.006507592
1836-07-15  1   0.006666712
1836-08-15  1   0.003311258
1836-09-15  1   0.00110011
1836-10-15  1   0
1836-11-15  1   0
1836-12-15  1   0.031746007
1837-01-15  1   -0.016483516
1837-02-15  1   -0.001117318
1837-03-15  1   -0.011185682
1837-04-15  1   -0.013574661
1837-05-15  1   -0.099770642
1837-06-15  1   0.024203822
1837-07-15  1   0.032051259
1837-08-15  1   0.017391304
1837-09-15  1   0.010989011
1837-10-15  1   0.007246377
1837-11-15  1   0.007194245
1837-12-15  1   0.017857143
1838-01-15  1   -0.025270727
1838-02-15  1   -0.007407407
1838-03-15  1   0.002487562
1838-04-15  1   0.001240695
1838-05-15  1   0.016109046
1838-06-15  1   0.049682711
1838-07-15  1   -0.028708134
1838-08-15  1   0.006157635
1838-09-15  1   0
1838-10-15  1   0.00244798
1838-11-15  1   -0.001221001
1838-12-15  1   0.002444988
1839-01-15  1   0.020100534
1839-02-15  1   -0.003078818
1839-03-15  1   -0.009264978
1839-04-15  1   0.00872818
1839-05-15  1   -0.003708282
1839-06-15  1   0.007444169
1839-07-15  1   0.003787905
1839-08-15  1   -0.056603774
1839-09-15  1   0
1839-10-15  1   0
1839-11-15  1   -0.008
1839-12-15  1   0
1840-01-15  1   0.005405382
1840-02-15  1   0
1840-03-15  1   0
1840-04-15  1   0.055107527
1840-05-15  1   -0.01656051
1840-06-15  1   0.023316062
1840-07-15  1   0.06469004
1840-08-15  1   0
1840-09-15  1   -0.032911392
1840-10-15  1   0.002617801
1840-11-15  1   0.033942559
1840-12-15  1   0.005050505
1841-01-15  1   -0.049222771
1841-02-15  1   -0.039509537
1841-03-15  1   -0.092198582
1841-04-15  1   -0.034375
1841-05-15  1   0.019417476
1841-06-15  1   0.031746032
1841-07-15  1   -0.05520506
1841-08-15  1   0.010016694
1841-09-15  1   0
1841-10-15  1   0
1841-11-15  1   -0.041322314
1841-12-15  1   -0.224137931
1842-01-15  1   -0.271300443
1842-02-15  1   -0.021538462
1842-03-15  1   -0.056603774
1842-04-15  1   0.266666667
1842-05-15  1   0
1842-06-15  1   -0.039473684
1842-07-15  1   0.011080334
1842-08-15  1   -0.123287671
1842-09-15  1   0
1842-10-15  1   0.015625
1842-11-15  1   0.058461538
1842-12-15  1   0.117366569
1843-01-15  1   0.115789474
1843-02-15  1   0.301886792
1843-03-15  1   -0.011775362
1843-04-15  1   0.052245646
1843-05-15  1   0.06445993
1843-06-15  1   0.011456628
1843-07-15  1   0.011784516
1843-08-15  1   0.023294509
1843-09-15  1   0.105691057
1843-10-15  1   0.014705882
1843-11-15  1   0.014492754
1843-12-15  1   0.057142857
1844-01-15  1   0.023742988
1844-02-15  1   -0.010914052
1844-03-15  1   0.011034483
1844-04-15  1   -0.02319236
1844-05-15  1   -0.012569832
1844-06-15  1   0.032531825
1844-07-15  1   0.110315232
1844-08-15  1   0
1844-09-15  1   -0.03483871
1844-10-15  1   0.004010695
1844-11-15  1   0.015978695
1844-12-15  1   0.008519004
1845-01-15  1   0.030169456
1845-02-15  1   0
1845-03-15  1   -0.015465614
1845-04-15  1   0.018716578
1845-05-15  1   0.038057743
1845-06-15  1   0
1845-07-15  1   0.048748326
1845-08-15  1   0.025125628
1845-09-15  1   0.013480392
1845-10-15  1   0.00241838
1845-11-15  1   -0.009650181
1845-12-15  1   -0.008526188
1846-01-15  1   -0.002557543
1846-02-15  1   0.021153846
1846-03-15  1   0.003138732
1846-04-15  1   0
1846-05-15  1   -0.01126408
1846-06-15  1   0.016455696
1846-07-15  1   0.032425423
1846-08-15  1   -0.00879397
1846-09-15  1   -0.005069708
1846-10-15  1   0.01910828
1846-11-15  1   0
1846-12-15  1   0.0275
1847-01-15  1   0.020253163
1847-02-15  1   0.019851117
1847-03-15  1   0.008515815
1847-04-15  1   0.133896261

1 个答案:

答案 0 :(得分:0)

您的问题有点令人困惑。您到底期望什么输出?您可以提供一个可复制的示例吗?

我试图回答您在此处写的内容:“我正在尝试创建一个模式变量,该变量采用过去20年同月的平均值(滞后12、24 ... 240)以及其他月份滞后(1-11、13-23、25-35 ... 229-239)”

我创建了一个玩具数据框,其中包含三个变量(年,月,数量=一些随机值),具有20年的观测值和每年一次的每月观测值(因此有240个观测值)。

df <- tibble(year=rep(seq(1:20),12),month=rep(1:12,each=20),amount=runif(240)) %>% 
  arrange(year,month)
df

# find the monthly increase year to year (eg, January year 1 to January year 2)
df <- df %>% mutate(diff.yearly=amount-lag(amount,12)) 

# find monthly mean over 20 years
df %>% group_by(month) %>% summarise(m=mean(amount))

lapply(1:12,function(x){
  seasonal <- df %>% filter(month==x) %>% 
    summarise(seasonal_mean=mean(amount))
  seasonal_rev <- df %>% filter(month!=x) %>% 
    summarise(seasonal_rev_mean=mean(amount))
  cbind(seasonal,seasonal_rev)
})

我认为最后一个使用lapply()的命令可以回答部分问题。它找到20年中一个月的平均值,以及20年中其他11个月的平均值。它遍历每个月并返回12对值(即,第一对将是第1个月的均值和2-12个月的均值,第二对将是第2个月的均值和第1个月的均值和3-12)。

更新 这是我的新解决方案,它遵循下面我的评论中概述的步骤。

set.seed(123)
# create df of series data
# series 1 = 4 years, so remove -- 48 rows
# series 2 = 5 years, so keep -- 60 rows
# series 3 = 20 years, so keep -- 240 rows
# series 4 = 21 years, so remove first year -- 252 rows
df <- data.frame(
  series=c(rep('a',48),rep('b',60),rep('c',240),rep('d',252)),
  date=c(seq.Date(as.Date('1820-01-01'),as.Date('1823-12-01'),by='month'),
    seq.Date(as.Date('1820-01-01'),as.Date('1824-12-01'),by='month'),
    seq.Date(as.Date('1820-01-01'),as.Date('1839-12-01'),by='month'),
    seq.Date(as.Date('1820-01-01'),as.Date('1840-12-01'),by='month')),
  value=runif(600))

# sanity checks for toy data
df %>% count(series)
df %>% count(date)
df %>% count(date) %>% count(n)

# split year and month
df <- df %>% mutate(year=str_split_fixed(date,'-',3)[,1],
              month=str_split_fixed(date,'-',3)[,2])
df %>% count(year)
df %>% count(month)

# find series with <5 years of data
years <- df %>% distinct(series,year) %>% group_by(series) %>% 
  mutate(year=as.numeric(year)) %>% 
  summarise(y=max(year)-min(year)) # note: 4=threshold for 5 years

# filter out series with <5 years
# removes series 'a', which only has 4 years of data
df <- left_join(df,years) %>% filter(y>=4) # keep y var for next step

# find observations for series with >20 years of data -- remove earlier years
df <- df %>% group_by(series) %>% 
  mutate(max.year=max(as.numeric(year)),
         min.year=max.year-19) %>% 
  filter(year >= min.year) # removes 12 obs from 1820 for series d

# check that each series now has 5-20 years of data
df %>% count(series,year) %>% count(series)

# compute seasonal means
# first split data by series
# second, split by month
# calculate means for each month and for other 11 months
# output a data frame and tidy up to then join with original data
means <- df %>% mutate(month=as.numeric(month)) %>% 
  dlply(.(series),function(x){
  sapply(1:12,function(i){
    seasonal <- x %>% filter(month==i) %>% 
      summarise(seasonal_mean=mean(value))
    seasonal_rev <- x %>% filter(month!=i) %>% 
      summarise(seasonal_rev_mean=mean(value))
    cbind(seasonal,seasonal_rev)
  })
}) %>% data.frame() %>% mutate(type=row.names(.)) %>% 
  gather(k,v,-type) %>% mutate(v=as.numeric(v)) %>% 
  spread(type,v) %>% # make seasonal and seasonal_rev means columns
  mutate(series=str_split_fixed(k,'\\.',2)[,1],
         month=as.numeric(str_split_fixed(k,'\\.',2)[,2])) %>% 
  select(-k)

# join yearly seasonal means with original data
df <- df %>% mutate(month=as.numeric(month)) %>% left_join(.,means)

# check that means are repeated across months within each series
df %>% count(series,month,seasonal_mean) %>% distinct(n) # 5, 20, 20

# additional check - monthly seasonal means match the joined data
df %>% group_by(series,month) %>% summarise(m=mean(value)) %>% 
  filter(m %in% df$seasonal_mean) # 36 rows returned, as expected