数据集:
date bal
1/31/2013 10
1/31/2013 11
1/31/2013 12
1/31/2013 13
1/31/2013 14
2/28/2013 20
2/28/2013 30
2/28/2013 40
2/28/2013 50
2/28/2013 60
3/30/2013 10
3/30/2013 11
3/30/2013 12
3/30/2013 13
3/30/2013 15
使用的代码:
bb <- read.csv("abc.csv", stringsAsFactors=T, header=T)
bb
library(dplyr)
new_data <- bb %>%
mutate(D = (bal) / lag(bal[1:5])) %>%
data.frame()
new_data
我们正在划分第2组(日期 - 2013年2月28日第二排= 30)/(第1组 - 1/31/2013第一排= 10) 即:30/10 = 3.000,40 / 11 = 3.63,50 / 12 = 4.16,依此类推。
输出来自上面的代码:
date bal D
1 1/31/2013 10 NA
2 1/31/2013 11 1.100000
3 1/31/2013 12 1.090909
4 1/31/2013 13 1.083333
5 1/31/2013 14 1.076923
6 2/28/2013 20 NA
7 2/28/2013 30 3.000000
8 2/28/2013 40 3.636364
9 2/28/2013 50 4.166667
10 2/28/2013 60 4.615385
11 3/30/2013 10 NA
12 3/30/2013 11 1.100000
13 3/30/2013 12 1.090909
14 3/30/2013 13 1.083333
15 3/30/2013 15 1.153846
现在的问题是:
第一组保留为参考= Divisor,即10,11,12,13 这意味着所有下面的日期组(bal)都会被第一个参考组划分。
我们希望每次除数应该按下一组日期增加,并且与下面的组(divident)相同,如此。
date bal D
1 1/31/2013 10 NA
2 1/31/2013 11 NA
3 1/31/2013 12 NA
4 1/31/2013 13 NA
5 1/31/2013 14 NA
6 2/28/2013 20 NA
7 2/28/2013 30 3.000000 - 30 / 10 = 3
8 2/28/2013 40 3.636364 - 40 / 11 = 3.63
9 2/28/2013 50 4.166667 - 50 / 12 = 4.16
10 2/28/2013 60 4.615385 - 60 / 13 = 4.61
11 3/30/2013 10 NA NA
12 3/30/2013 11 1.100000 - 11 / 20 = 0.55
13 3/30/2013 12 1.090909 - 12 / 30 = 0.4
14 3/30/2013 13 1.083333 - 13 / 40 = 0.325
15 3/30/2013 15 1.153846 - 15 / 50 = 0.3
我期待上面的输出。
答案 0 :(得分:0)
DF %>%
group_by(g1=seq_along(bal) %% 5) %>%
mutate(denominator=lag(bal)) %>%
ungroup() %>%
group_by(g2=(seq_along(bal) - 1) %/% 5) %>%
mutate(denominator=lag(denominator),
D=bal / denominator) %>%
ungroup()
# # A tibble: 15 x 6
# date bal g1 denominator g2 D
# <fctr> <int> <dbl> <int> <dbl> <dbl>
# 1 1/31/2013 10 1 NA 0 NA
# 2 1/31/2013 11 2 NA 0 NA
# 3 1/31/2013 12 3 NA 0 NA
# 4 1/31/2013 13 4 NA 0 NA
# 5 1/31/2013 14 0 NA 0 NA
# 6 2/28/2013 20 1 NA 1 NA
# 7 2/28/2013 30 2 10 1 3.000000
# 8 2/28/2013 40 3 11 1 3.636364
# 9 2/28/2013 50 4 12 1 4.166667
# 10 2/28/2013 60 0 13 1 4.615385
# 11 3/30/2013 10 1 NA 2 NA
# 12 3/30/2013 11 2 20 2 0.550000
# 13 3/30/2013 12 3 30 2 0.400000
# 14 3/30/2013 13 4 40 2 0.325000
# 15 3/30/2013 15 0 50 2 0.300000
答案 1 :(得分:0)
OP has confirmed,每个日期的行数始终相同。通过这种观察,只需将bal
的值滞后6行即可获得一个非常简单的解决方案。由于这首先忽略了组,因此有必要将结果D
设置为每组中第一行的NA,即最后每5行。
使用data.table
,这可以写成一个&#34; one-liner&#34;:
library(data.table) # CRAN version 1.10.4 used
setDT(bb)[, D := bal / shift(bal, 6L)][seq(1L, nrow(bb), 5L), D := NA][]
产生预期结果:
date bal D
1: 1/31/2013 10 NA
2: 1/31/2013 11 NA
3: 1/31/2013 12 NA
4: 1/31/2013 13 NA
5: 1/31/2013 14 NA
6: 2/28/2013 20 NA
7: 2/28/2013 30 3.000000
8: 2/28/2013 40 3.636364
9: 2/28/2013 50 4.166667
10: 2/28/2013 60 4.615385
11: 3/30/2013 10 NA
12: 3/30/2013 11 0.550000
13: 3/30/2013 12 0.400000
14: 3/30/2013 13 0.325000
15: 3/30/2013 15 0.300000