使用当前记录值作为参考在data.tables包中查找?

时间:2017-11-11 17:31:24

标签: r dataframe data.table

最近开始使用datatables包,我在查找时遇到了一些麻烦。这是数据:

     Date MonthNo Unique Items Amounts Total
 1:   Jan       1    AAA     x      10    10
 2:   Jan       1    BBB     y       2     0
 3:   Feb       2    CCC     x       3     3
 4:   Feb       2    DDD     y      15     0
 5: March       3    AAA     y      20     0
 6: March       3    BBB     x      35    35
 7: April       4    CCC     x      15    15
 8: April       4    AAA     y      50     0
 9:   May       5    BBB     x      60    60
10:   May       5    CCC     y      70     0
11:  June       6    DDD     x     100   100
12:  June       6    AAA     y      20     0

基本上,我想创建一个名为PYTD的新列,它基本上是每个月每个唯一的总数,但仅限于前一个月。 例如:

    Date MonthNo Unique Items Amounts Total  PYTD

 7: April       4    CCC     x      15    3

这是我到目前为止的代码:

Sys.setlocale("LC_CTYPE", "en_US.UTF-8")
library(data.table)
data <- read.csv("sample.csv")
df <- as.data.frame(data)
#str(df)
dt <- data.table(df)
dt
#str(dt)
dt$Total = ifelse(dt$Items == "x",dt$Amounts,0)

dtgrouped2 = dt[, lapply(.SD, sum, na.rm=TRUE), by=list(MonthNo,Unique),
                .SDcol=c("Total")]

dtgrouped2$PYTD <- dtgrouped2[MonthNo == (dtgrouped2$MonthNo-1)
                                  & Unique == dtgrouped2$Unique,Total]

但是dtgrouped2 $ PYTD不幸地给了我NAs。

这是我正在寻找的最终结果

   MonthNo Unique Total PYTD
 1:       1    AAA    10   NA
 2:       1    BBB     0   NA
 3:       2    CCC     3   NA
 4:       2    DDD     0   NA
 5:       3    AAA     0   10
 6:       3    BBB    35    0
 7:       4    CCC    15    3
 8:       4    AAA     0    0
 9:       5    BBB    60   35
10:       5    CCC     0   15
11:       6    DDD   100    0
12:       6    AAA     0    0

1 个答案:

答案 0 :(得分:0)

在增加计算总和的MonthNo后,您可以将数据与自身合并:

# create fake data
library(data.table)
set.seed(0)
dt <- data.table(MonthNo = rep(1:4, each = 3),
                 Unique = LETTERS[1:2],
                 Total = runif(12))
dt

  MonthNo Unique      Total
1:       1      A 0.89669720
2:       1      B 0.26550866
3:       1      A 0.37212390
4:       2      B 0.57285336
5:       2      A 0.90820779
6:       2      B 0.20168193
7:       3      A 0.89838968
8:       3      B 0.94467527
9:       3      A 0.66079779
10:      4      B 0.62911404
11:      4      A 0.06178627
12:      4      B 0.20597457

dt[, list(PYTD = sum(Total)), by = list(Unique, MonthNo)
   ][, MonthNo := MonthNo + 1][
     dt, on = .(MonthNo, Unique)]

Unique MonthNo      PYTD      Total
1:      A       1        NA 0.89669720
2:      B       1        NA 0.26550866
3:      A       1        NA 0.37212390
4:      B       2 0.2655087 0.57285336
5:      A       2 1.2688211 0.90820779
6:      B       2 0.2655087 0.20168193
7:      A       3 0.9082078 0.89838968
8:      B       3 0.7745353 0.94467527
9:      A       3 0.9082078 0.66079779
10:     B       4 0.9446753 0.62911404
11:     A       4 1.5591875 0.06178627
12:     B       4 0.9446753 0.20597457