我对此主题Calculate mean by group using dplyr package还有一个疑问:
假设您想做同样的事情,但是现在创建一个变量,该变量是两次延迟的平均值。
在这里如何使用均值函数?
Df <- Df %>%
group_by(id) %>%
mutate(seasonal = mean(lag(x,12), lag(x,24), lag(x,36), lag(x,48), lag(x,60), lag(x,72))) %>%
mutate (seasonalreversal = mean(lag(x,1),lag(x,2)....,lag(x,11),lag(x,13)...
我正在尝试创建一个模式变量,该变量采用过去20年中同一月份的平均值(滞后12,24 ... 240),而其他月份的平均值均滞后(1-11、13-23) ,25-35 ... 229-239)它必须使用至少5年的数据,所以我创建了一个变量“ datem”,在两次观察之间间隔一个。
请注意,我也想保留变量,因此我也尝试了汇总,但是由于均值函数中不可能包含多个变量,因此两者都返回错误。
这是我希望系列之一的外观。首先是在获得5年数据之后进行计算。 <-至少需要5年才能启动。它最多可以使用20年的过去数据。如果有21年的数据,则该系列将不使用第一年的数据。
date seriesid totret seasonal seasonal reversal
1825-06-15 73717 0 NA NA
1825-07-15 73717 -0.004149341 NA NA
1825-08-15 73717 0.004166667 NA NA
1825-09-15 73717 0.004149378 NA NA
1825-10-15 73717 0.010330579 NA NA
1825-11-15 73717 -0.00204499 NA NA
1825-12-15 73717 0.03140648 NA NA
1826-01-15 73717 -0.026476578 NA NA
1826-02-15 73717 -0.005230126 NA NA
1826-03-15 73717 -0.001051525 NA NA
1826-04-15 73717 0.018947368 NA NA
1826-05-15 73717 0.008264463 NA NA
1826-06-15 73717 0.024590164 NA NA
1826-07-15 73717 0.004098341 NA NA
1826-08-15 73717 0.004081633 NA NA
1826-09-15 73717 -0.00203252 NA NA
1826-10-15 73717 0 NA NA
1826-11-15 73717 0.00203666 NA NA
1826-12-15 73717 0.016260163 NA NA
1827-01-15 73717 -0.008196695 NA NA
1827-02-15 73717 0 NA NA
1827-03-15 73717 0 NA NA
1827-04-15 73717 0.002066116 NA NA
1827-05-15 73717 0 NA NA
1827-06-15 73717 0.027077923 NA NA
1827-07-15 73717 -0.024489796 NA NA
1827-08-15 73717 0.008368201 NA NA
1827-09-15 73717 0.010373444 NA NA
1827-10-15 73717 0 NA NA
1827-11-15 73717 0.006160164 NA NA
1827-12-15 73717 0.008163265 NA NA
1828-01-15 73717 -0.024691337 NA NA
1828-02-15 73717 0 NA NA
1828-03-15 73717 -0.016877637 NA NA
1828-04-15 73717 -0.006437768 NA NA
1828-05-15 73717 -0.010799136 NA NA
1828-06-15 73717 0.008733624 NA NA
1828-07-15 73717 0.00663719 NA NA
1828-08-15 73717 -0.017582418 NA NA
1828-09-15 73717 -0.015659955 NA NA
1828-10-15 73717 0.004545455 NA NA
1828-11-15 73717 0 NA NA
1828-12-15 73717 0 NA NA
1829-01-15 73717 -2.11114E-08 NA NA
1829-02-15 73717 -0.023148148 NA NA
1829-03-15 73717 -0.009478673 NA NA
1829-04-15 73717 -0.007177033 NA NA
1829-05-15 73717 -0.009638554 NA NA
1829-06-15 73717 -0.02189781 NA NA
1829-07-15 73717 0.025510196 NA NA
1829-08-15 73717 0.02238806 NA NA
1829-09-15 73717 0 NA NA
1829-10-15 73717 0 NA NA
1829-11-15 73717 0 NA NA
1829-12-15 73717 0.01946472 NA NA
1830-01-15 73717 -2.79358E-08 NA NA
1830-02-15 73717 0 NA NA
1830-03-15 73717 0.004889976 NA NA
1830-04-15 73717 0.0243309 NA NA
1830-05-15 73717 0.014251781 NA NA
1830-06-15 73717 0.007025761 0.00770078 0.000831435
1830-07-15 73717 0.030952369 0.001521318 0.001493786
1830-08-15 73717 -0.002309469 0.004284428 0.001768225
1830-09-15 73717 0.00462963 -0.000633931 0.002121916
这是20多年的数据:
date seriesid totret
1825-06-15 1 0
1825-07-15 1 -0.004149341
1825-08-15 1 0.004166667
1825-09-15 1 0.004149378
1825-10-15 1 0.010330579
1825-11-15 1 -0.00204499
1825-12-15 1 0.03140648
1826-01-15 1 -0.026476578
1826-02-15 1 -0.005230126
1826-03-15 1 -0.001051525
1826-04-15 1 0.018947368
1826-05-15 1 0.008264463
1826-06-15 1 0.024590164
1826-07-15 1 0.004098341
1826-08-15 1 0.004081633
1826-09-15 1 -0.00203252
1826-10-15 1 0
1826-11-15 1 0.00203666
1826-12-15 1 0.016260163
1827-01-15 1 -0.008196695
1827-02-15 1 0
1827-03-15 1 0
1827-04-15 1 0.002066116
1827-05-15 1 0
1827-06-15 1 0.027077923
1827-07-15 1 -0.024489796
1827-08-15 1 0.008368201
1827-09-15 1 0.010373444
1827-10-15 1 0
1827-11-15 1 0.006160164
1827-12-15 1 0.008163265
1828-01-15 1 -0.024691337
1828-02-15 1 0
1828-03-15 1 -0.016877637
1828-04-15 1 -0.006437768
1828-05-15 1 -0.010799136
1828-06-15 1 0.008733624
1828-07-15 1 0.00663719
1828-08-15 1 -0.017582418
1828-09-15 1 -0.015659955
1828-10-15 1 0.004545455
1828-11-15 1 0
1828-12-15 1 0
1829-01-15 1 -2.11114E-08
1829-02-15 1 -0.023148148
1829-03-15 1 -0.009478673
1829-04-15 1 -0.007177033
1829-05-15 1 -0.009638554
1829-06-15 1 -0.02189781
1829-07-15 1 0.025510196
1829-08-15 1 0.02238806
1829-09-15 1 0
1829-10-15 1 0
1829-11-15 1 0
1829-12-15 1 0.01946472
1830-01-15 1 -2.79358E-08
1830-02-15 1 0
1830-03-15 1 0.004889976
1830-04-15 1 0.0243309
1830-05-15 1 0.014251781
1830-06-15 1 0.007025761
1830-07-15 1 0.030952369
1830-08-15 1 -0.002309469
1830-09-15 1 0.00462963
1830-10-15 1 0.00921659
1830-11-15 1 0.00456621
1830-12-15 1 0.002272727
1831-01-15 1 0.009280759
1831-02-15 1 0.001149425
1831-03-15 1 0.003444317
1831-04-15 1 0.013729977
1831-05-15 1 0.0248307
1831-06-15 1 0.017621145
1831-07-15 1 0.022123885
1831-08-15 1 0.008658009
1831-09-15 1 0.006437768
1831-10-15 1 0.004264392
1831-11-15 1 0
1831-12-15 1 0.021691929
1832-01-15 1 -0.031847134
1832-02-15 1 -0.004385965
1832-03-15 1 -0.015418502
1832-04-15 1 -0.011185682
1832-05-15 1 0.012443439
1832-06-15 1 0.046618922
1832-07-15 1 -0.024017467
1832-08-15 1 0
1832-09-15 1 0
1832-10-15 1 0
1832-11-15 1 0.017897092
1832-12-15 1 0.041758242
1833-01-15 1 -0.034482805
1833-02-15 1 0.002232143
1833-03-15 1 0.002227171
1833-04-15 1 0.008888889
1833-05-15 1 0.017621145
1833-06-15 1 0.028138528
1833-07-15 1 0.00755941
1833-08-15 1 0
1833-09-15 1 0
1833-10-15 1 0.024651661
1833-11-15 1 -0.031380753
1833-12-15 1 0.028077754
1834-01-15 1 -0.053879337
1834-02-15 1 -0.009111617
1834-03-15 1 -0.034482759
1834-04-15 1 0.05952381
1834-05-15 1 -0.015730337
1834-06-15 1 -0.00456621
1834-07-15 1 0.021226443
1834-08-15 1 0.008660508
1834-09-15 1 0.00057241
1834-10-15 1 0.004576659
1834-11-15 1 0.006833713
1834-12-15 1 -0.001131222
1835-01-15 1 0.018626295
1835-02-15 1 0
1835-03-15 1 0.009142857
1835-04-15 1 0.009060023
1835-05-15 1 0.03030303
1835-06-15 1 0.005446623
1835-07-15 1 0.045810087
1835-08-15 1 -0.008012821
1835-09-15 1 -0.01992461
1835-10-15 1 -0.004395604
1835-11-15 1 0.002207506
1835-12-15 1 0
1836-01-15 1 0.013636388
1836-02-15 1 0.025784753
1836-03-15 1 -0.007650273
1836-04-15 1 0.007709251
1836-05-15 1 0.007650273
1836-06-15 1 0.006507592
1836-07-15 1 0.006666712
1836-08-15 1 0.003311258
1836-09-15 1 0.00110011
1836-10-15 1 0
1836-11-15 1 0
1836-12-15 1 0.031746007
1837-01-15 1 -0.016483516
1837-02-15 1 -0.001117318
1837-03-15 1 -0.011185682
1837-04-15 1 -0.013574661
1837-05-15 1 -0.099770642
1837-06-15 1 0.024203822
1837-07-15 1 0.032051259
1837-08-15 1 0.017391304
1837-09-15 1 0.010989011
1837-10-15 1 0.007246377
1837-11-15 1 0.007194245
1837-12-15 1 0.017857143
1838-01-15 1 -0.025270727
1838-02-15 1 -0.007407407
1838-03-15 1 0.002487562
1838-04-15 1 0.001240695
1838-05-15 1 0.016109046
1838-06-15 1 0.049682711
1838-07-15 1 -0.028708134
1838-08-15 1 0.006157635
1838-09-15 1 0
1838-10-15 1 0.00244798
1838-11-15 1 -0.001221001
1838-12-15 1 0.002444988
1839-01-15 1 0.020100534
1839-02-15 1 -0.003078818
1839-03-15 1 -0.009264978
1839-04-15 1 0.00872818
1839-05-15 1 -0.003708282
1839-06-15 1 0.007444169
1839-07-15 1 0.003787905
1839-08-15 1 -0.056603774
1839-09-15 1 0
1839-10-15 1 0
1839-11-15 1 -0.008
1839-12-15 1 0
1840-01-15 1 0.005405382
1840-02-15 1 0
1840-03-15 1 0
1840-04-15 1 0.055107527
1840-05-15 1 -0.01656051
1840-06-15 1 0.023316062
1840-07-15 1 0.06469004
1840-08-15 1 0
1840-09-15 1 -0.032911392
1840-10-15 1 0.002617801
1840-11-15 1 0.033942559
1840-12-15 1 0.005050505
1841-01-15 1 -0.049222771
1841-02-15 1 -0.039509537
1841-03-15 1 -0.092198582
1841-04-15 1 -0.034375
1841-05-15 1 0.019417476
1841-06-15 1 0.031746032
1841-07-15 1 -0.05520506
1841-08-15 1 0.010016694
1841-09-15 1 0
1841-10-15 1 0
1841-11-15 1 -0.041322314
1841-12-15 1 -0.224137931
1842-01-15 1 -0.271300443
1842-02-15 1 -0.021538462
1842-03-15 1 -0.056603774
1842-04-15 1 0.266666667
1842-05-15 1 0
1842-06-15 1 -0.039473684
1842-07-15 1 0.011080334
1842-08-15 1 -0.123287671
1842-09-15 1 0
1842-10-15 1 0.015625
1842-11-15 1 0.058461538
1842-12-15 1 0.117366569
1843-01-15 1 0.115789474
1843-02-15 1 0.301886792
1843-03-15 1 -0.011775362
1843-04-15 1 0.052245646
1843-05-15 1 0.06445993
1843-06-15 1 0.011456628
1843-07-15 1 0.011784516
1843-08-15 1 0.023294509
1843-09-15 1 0.105691057
1843-10-15 1 0.014705882
1843-11-15 1 0.014492754
1843-12-15 1 0.057142857
1844-01-15 1 0.023742988
1844-02-15 1 -0.010914052
1844-03-15 1 0.011034483
1844-04-15 1 -0.02319236
1844-05-15 1 -0.012569832
1844-06-15 1 0.032531825
1844-07-15 1 0.110315232
1844-08-15 1 0
1844-09-15 1 -0.03483871
1844-10-15 1 0.004010695
1844-11-15 1 0.015978695
1844-12-15 1 0.008519004
1845-01-15 1 0.030169456
1845-02-15 1 0
1845-03-15 1 -0.015465614
1845-04-15 1 0.018716578
1845-05-15 1 0.038057743
1845-06-15 1 0
1845-07-15 1 0.048748326
1845-08-15 1 0.025125628
1845-09-15 1 0.013480392
1845-10-15 1 0.00241838
1845-11-15 1 -0.009650181
1845-12-15 1 -0.008526188
1846-01-15 1 -0.002557543
1846-02-15 1 0.021153846
1846-03-15 1 0.003138732
1846-04-15 1 0
1846-05-15 1 -0.01126408
1846-06-15 1 0.016455696
1846-07-15 1 0.032425423
1846-08-15 1 -0.00879397
1846-09-15 1 -0.005069708
1846-10-15 1 0.01910828
1846-11-15 1 0
1846-12-15 1 0.0275
1847-01-15 1 0.020253163
1847-02-15 1 0.019851117
1847-03-15 1 0.008515815
1847-04-15 1 0.133896261
答案 0 :(得分:0)
您的问题有点令人困惑。您到底期望什么输出?您可以提供一个可复制的示例吗?
我试图回答您在此处写的内容:“我正在尝试创建一个模式变量,该变量采用过去20年同月的平均值(滞后12、24 ... 240)以及其他月份滞后(1-11、13-23、25-35 ... 229-239)”
我创建了一个玩具数据框,其中包含三个变量(年,月,数量=一些随机值),具有20年的观测值和每年一次的每月观测值(因此有240个观测值)。
df <- tibble(year=rep(seq(1:20),12),month=rep(1:12,each=20),amount=runif(240)) %>%
arrange(year,month)
df
# find the monthly increase year to year (eg, January year 1 to January year 2)
df <- df %>% mutate(diff.yearly=amount-lag(amount,12))
# find monthly mean over 20 years
df %>% group_by(month) %>% summarise(m=mean(amount))
lapply(1:12,function(x){
seasonal <- df %>% filter(month==x) %>%
summarise(seasonal_mean=mean(amount))
seasonal_rev <- df %>% filter(month!=x) %>%
summarise(seasonal_rev_mean=mean(amount))
cbind(seasonal,seasonal_rev)
})
我认为最后一个使用lapply()
的命令可以回答部分问题。它找到20年中一个月的平均值,以及20年中其他11个月的平均值。它遍历每个月并返回12对值(即,第一对将是第1个月的均值和2-12个月的均值,第二对将是第2个月的均值和第1个月的均值和3-12)。
更新 这是我的新解决方案,它遵循下面我的评论中概述的步骤。
set.seed(123)
# create df of series data
# series 1 = 4 years, so remove -- 48 rows
# series 2 = 5 years, so keep -- 60 rows
# series 3 = 20 years, so keep -- 240 rows
# series 4 = 21 years, so remove first year -- 252 rows
df <- data.frame(
series=c(rep('a',48),rep('b',60),rep('c',240),rep('d',252)),
date=c(seq.Date(as.Date('1820-01-01'),as.Date('1823-12-01'),by='month'),
seq.Date(as.Date('1820-01-01'),as.Date('1824-12-01'),by='month'),
seq.Date(as.Date('1820-01-01'),as.Date('1839-12-01'),by='month'),
seq.Date(as.Date('1820-01-01'),as.Date('1840-12-01'),by='month')),
value=runif(600))
# sanity checks for toy data
df %>% count(series)
df %>% count(date)
df %>% count(date) %>% count(n)
# split year and month
df <- df %>% mutate(year=str_split_fixed(date,'-',3)[,1],
month=str_split_fixed(date,'-',3)[,2])
df %>% count(year)
df %>% count(month)
# find series with <5 years of data
years <- df %>% distinct(series,year) %>% group_by(series) %>%
mutate(year=as.numeric(year)) %>%
summarise(y=max(year)-min(year)) # note: 4=threshold for 5 years
# filter out series with <5 years
# removes series 'a', which only has 4 years of data
df <- left_join(df,years) %>% filter(y>=4) # keep y var for next step
# find observations for series with >20 years of data -- remove earlier years
df <- df %>% group_by(series) %>%
mutate(max.year=max(as.numeric(year)),
min.year=max.year-19) %>%
filter(year >= min.year) # removes 12 obs from 1820 for series d
# check that each series now has 5-20 years of data
df %>% count(series,year) %>% count(series)
# compute seasonal means
# first split data by series
# second, split by month
# calculate means for each month and for other 11 months
# output a data frame and tidy up to then join with original data
means <- df %>% mutate(month=as.numeric(month)) %>%
dlply(.(series),function(x){
sapply(1:12,function(i){
seasonal <- x %>% filter(month==i) %>%
summarise(seasonal_mean=mean(value))
seasonal_rev <- x %>% filter(month!=i) %>%
summarise(seasonal_rev_mean=mean(value))
cbind(seasonal,seasonal_rev)
})
}) %>% data.frame() %>% mutate(type=row.names(.)) %>%
gather(k,v,-type) %>% mutate(v=as.numeric(v)) %>%
spread(type,v) %>% # make seasonal and seasonal_rev means columns
mutate(series=str_split_fixed(k,'\\.',2)[,1],
month=as.numeric(str_split_fixed(k,'\\.',2)[,2])) %>%
select(-k)
# join yearly seasonal means with original data
df <- df %>% mutate(month=as.numeric(month)) %>% left_join(.,means)
# check that means are repeated across months within each series
df %>% count(series,month,seasonal_mean) %>% distinct(n) # 5, 20, 20
# additional check - monthly seasonal means match the joined data
df %>% group_by(series,month) %>% summarise(m=mean(value)) %>%
filter(m %in% df$seasonal_mean) # 36 rows returned, as expected