对角线乘积乘法

时间:2017-06-22 15:53:27

标签: r dataframe data.table dplyr

我想要每组日期的对角线乘法。

主要数据集:

 date      Bucket                 D            
1/31/2013   bkt 0                  NA
1/31/2013   bkt 1(10-20)           NA
1/31/2013   bkt 2(20-30)           NA
1/31/2013   bkt 3(30-40)           NA
1/31/2013   bkt 4(40+)             NA
2/28/2013   bkt 0                  NA
2/28/2013   bkt 1(10-20)          3.00
2/28/2013   bkt 2(20-30)          3.63
2/28/2013   bkt 3(30-40)          101
2/28/2013   bkt 4(40+)            102
3/30/2013   bkt 0                  NA
3/30/2013   bkt 1(10-20)          0.55
3/30/2013   bkt 2(20-30)          0.40
3/30/2013   bkt 3(30-40)          103
3/30/2013   bkt 4(40+)            104
4/31/2013   bkt 0                 NA
4/31/2013   bkt 1(10-20)          4.25              
4/31/2013   bkt 2(20-30)          3.65              
4/31/2013   bkt 3(30-40)          105        
4/31/2013   bkt 4(40+)            106        
5/30/2013   bkt 0                  NA
5/30/2013   bkt 1(10-20)          2.34      
5/30/2013   bkt 2(20-30)          4.10      
5/30/2013   bkt 3(30-40)          107                       
5/30/2013   bkt 4(40+)            108       
6/31/2013   bkt 0                 NA     
6/31/2013   bkt 1(10-20)          4         
6/31/2013   bkt 2(20-30)          5                
6/31/2013   bkt 3(30-40)          109       
6/31/2013   bkt 4(40+)            110       
7/30/2013   bkt 0                 NA
7/30/2013   bkt 1(10-20)          8         
7/30/2013   bkt 2(20-30)          7            
7/30/2013   bkt 3(30-40)         111           
7/30/2013   bkt 4(40+)           112        

Diagonal multiplication is as follows: 

1/31/2013 to 5/30/2013
2/28/2013 to 6/31/2013
3/30/2013 to 7/30/2013

each time we incrementing the next group of dates for diagonal product. 

 as so on ... the dates ranges from 1/31/2013 to 12/31/2016.

预期产出:

  date      Bucket                 D         DP
1/31/2013   bkt 0                  NA
1/31/2013   bkt 1(10-20)           NA
1/31/2013   bkt 2(20-30)           NA
1/31/2013   bkt 3(30-40)           NA
1/31/2013   bkt 4(40+)             NA
2/28/2013   bkt 0                  NA
2/28/2013   bkt 1(10-20)          3.00
2/28/2013   bkt 2(20-30)          3.63
2/28/2013   bkt 3(30-40)          101
2/28/2013   bkt 4(40+)            102
3/30/2013   bkt 0                  NA
3/30/2013   bkt 1(10-20)          0.55
3/30/2013   bkt 2(20-30)          0.40
3/30/2013   bkt 3(30-40)          103
3/30/2013   bkt 4(40+)            104
4/31/2013   bkt 0                 NA
4/31/2013   bkt 1(10-20)          4.25              
4/31/2013   bkt 2(20-30)          3.65              
4/31/2013   bkt 3(30-40)          105        
4/31/2013   bkt 4(40+)            106        
5/30/2013   bkt 0                  NA
5/30/2013   bkt 1(10-20)          2.34      13608   (108 * 105 *  0.40 *  3.00) 
5/30/2013   bkt 2(20-30)          4.10      4536    (4.10  108 * 105 *  0.40)       
5/30/2013   bkt 3(30-40)          107       11340   (108 * 105)                 
5/30/2013   bkt 4(40+)            108       108     (108)
6/31/2013   bkt 0                 NA     
6/31/2013   bkt 1(10-20)          4         23628.275  (110 * 107 * 3.65 * 0.55)   
6/31/2013   bkt 2(20-30)          5         42960.5    (110 * 107 * 3.65)         
6/31/2013   bkt 3(30-40)          109       1170       (110 * 109 )
6/31/2013   bkt 4(40+)            110       110        (100)
7/30/2013   bkt 0                 NA
7/30/2013   bkt 1(10-20)          8         216627.6   (112 * 109 * 4.10 * 4.25)
7/30/2013   bkt 2(20-30)          7         50971.2    (112 * 109 * 4.10)     
7/30/2013   bkt 3(30-40)         111        12432      (112 * 109)   
7/30/2013   bkt 4(40+)           112        112        (112)

在输出中我们只需要显示这些列:Date,Bucket,D和DP,因为DP是乘法的结果。 ()中的任何内容仅用于解释结果。无需在列中显示。

用户的错误代码:

d <- read.csv("lossrate.csv", header=TRUE)
> d$date = as.Date(d$date, format="%m/%d/%Y")
> r <- reshape2::dcast(data=d, Bucket ~ date, value.var="D")[-1, -2]
Aggregation function missing: defaulting to length
> mat <- as.matrix(r[-1])
> myD <- col(mat) - row(mat)
> rg <- range(myD)
> out <- sapply(seq(rg[1], rg[2]), function(x) 
+   `length<-`(rev(cumprod(rev(mat[myD==x]))), nrow(mat)))[,1:ncol(mat)]
> out[, colSums(is.na(out)) > 0] <- NA
> colnames(out) <- colnames(mat) # add dates as headers
> out <- reshape2::melt(cbind(r[1], out))
Using Bucket as id variables
> out <- merge(d, out, by.x=c("date", "Bucket"), by.y=c("variable",             "Bucket"), all=TRUE)
> output: 

         date       Bucket      D value
1  2013-01-31        bkt 0     NA    NA
2  2013-01-31 bkt 1(10-20)     NA    NA
3  2013-01-31 bkt 2(20-30)     NA    NA
4  2013-01-31 bkt 3(30-40)     NA    NA
5  2013-01-31   bkt 4(40+)     NA    NA
6  2013-02-28        bkt 0     NA    NA
7  2013-02-28 bkt 1(10-20)   3.00    NA
8  2013-02-28 bkt 2(20-30)   3.63    NA
9  2013-02-28 bkt 3(30-40) 101.00    NA
10 2013-02-28   bkt 4(40+) 102.00    NA
11 2013-03-30        bkt 0     NA    NA
12 2013-03-30 bkt 1(10-20)   0.55    NA
13 2013-03-30 bkt 2(20-30)   0.40    NA
14 2013-03-30 bkt 3(30-40) 103.00    NA
15 2013-03-30   bkt 4(40+) 104.00    NA
16 2013-05-30        bkt 0     NA    NA
17 2013-05-30 bkt 1(10-20)   2.34    NA
18 2013-05-30 bkt 2(20-30)   4.10    NA
19 2013-05-30 bkt 3(30-40) 107.00    NA
20 2013-05-30   bkt 4(40+) 108.00    NA
21 2013-07-30        bkt 0     NA    NA
22 2013-07-30 bkt 1(10-20)   8.00     1
23 2013-07-30 bkt 2(20-30)   7.00     1
24 2013-07-30 bkt 3(30-40) 111.00     1
25 2013-07-30   bkt 4(40+) 112.00     1
26       <NA>        bkt 0     NA    NA
27       <NA>        bkt 0     NA    NA
28       <NA> bkt 1(10-20)   4.25     2
29       <NA> bkt 1(10-20)   4.00     2
30       <NA> bkt 2(20-30)   5.00     2
31       <NA> bkt 2(20-30)   3.65     2
32       <NA> bkt 3(30-40) 109.00     2
33       <NA> bkt 3(30-40) 105.00     2
34       <NA>   bkt 4(40+) 106.00     2
35       <NA>   bkt 4(40+) 110.00     2

我只根据我的数据集更改了csv文件的名称。

1 个答案:

答案 0 :(得分:2)

# Change to date format so can be ordered
d$date = as.Date(d$date, format="%m/%d/%Y")
# Form matrix so easier to find diagonals
# [-1, -2] removes rows & columns of all NA
r <- reshape2::dcast(data=d, Bucket ~ date, value.var="D")[-1, -2]
# convert to matrix to allow row and col functions & remove non-numeric grouping
mat <- as.matrix(r[-1]) 

# Multiply diagonals
myD <- col(mat) - row(mat)
rg <- range(myD)
out <- sapply(seq(rg[1], rg[2]), function(x) 
          `length<-`(rev(cumprod(rev(mat[myD==x]))), nrow(mat)))[,1:ncol(mat)]
# remove if not needed: ensures four values in product
# not sure if needed: done to match expected outcome
out[, colSums(is.na(out)) > 0] <- NA 

# reshape
colnames(out) <- colnames(mat) # add dates as headers
out <- reshape2::melt(cbind(r[1], out))

# merge with original data
out <- merge(d, out, by.x=c("date", "Bucket"), by.y=c("variable", "Bucket"), all=TRUE)

哪个给出了

out[21:35,]

         date       Bucket      D     value
21 2013-05-30        bkt 0     NA        NA
22 2013-05-30 bkt 1(10-20)   2.34  13608.00
23 2013-05-30 bkt 2(20-30)   4.10   4536.00
24 2013-05-30 bkt 3(30-40) 107.00  11340.00
25 2013-05-30   bkt 4(40+) 108.00    108.00
26 2013-06-30        bkt 0     NA        NA
27 2013-06-30 bkt 1(10-20)   4.00  23628.28
28 2013-06-30 bkt 2(20-30)   5.00  42960.50
29 2013-06-30 bkt 3(30-40) 109.00  11770.00
30 2013-06-30   bkt 4(40+) 110.00    110.00
31 2013-07-30        bkt 0     NA        NA
32 2013-07-30 bkt 1(10-20)   8.00 212724.40
33 2013-07-30 bkt 2(20-30)   7.00  50052.80
34 2013-07-30 bkt 3(30-40) 111.00  12208.00
35 2013-07-30   bkt 4(40+) 112.00    112.00

数据(*更正了4月和6月的日期)

d <- 
    structure(list(date = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 
5L, 5L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L), .Label = c("1/31/2013", 
"2/28/2013", "3/30/2013", "4/30/2013", "5/30/2013", "6/30/2013", 
"7/30/2013"), class = "factor"), Bucket = structure(c(1L, 2L, 
3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 
4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 
5L), .Label = c("bkt 0", "bkt 1(10-20)", "bkt 2(20-30)", "bkt 3(30-40)", 
"bkt 4(40+)"), class = "factor"), D = c(NA, NA, NA, NA, NA, NA, 
3, 3.63, 101, 102, NA, 0.55, 0.4, 103, 104, NA, 4.25, 3.65, 105, 
106, NA, 2.34, 4.1, 107, 108, NA, 4, 5, 109, 110, NA, 8, 7, 111, 
112)), .Names = c("date", "Bucket", "D"), class = "data.frame", row.names = c(NA, 
-35L))