R:求和data.table中不同子组的辅助值?

时间:2019-07-16 19:08:56

标签: r data.table

假定具有字段date, myCategory, revenue的数据表。假设您想知道所有子集中当日收入的比例以及不同子组中当日收入的比例,使得

 b[,{
    #First auxiliary variable of all revenue
    totalRev = sum(revenue)                     #SUBGROUP OF ALL REV

    #Second auxiliary variable of revenue by date, syntax wrong! How to do this?
    {totalRev_date=sum(revenue), by=list(date)} #DIFFERENT SUBGROUP, by DATE's rev

    #Within the subgroup by date and myCategory, we will use 1st&2nd auxiliary vars
    .SD[,.(Revenue_prop_of_TOT=revenue/totalRev,
          ,Revenue_prop_of_DAY=revenue/totalRev_date)    ,by=list(myCategory,date)]
    },]

我们需要计算辅助和,特定日期的所有收入以及整个历史的所有收入。

最终结果应如下所示:

date            myCategory       Revenue_prop_of_TOT         Revenue_prop_of_DAY
2019-01-01      Cat1             0.002                       0.2
...

您会看到辅助变量只是帮助功能。

如何在R data.table中进行不同子组的计算?

3 个答案:

答案 0 :(得分:1)

希望我能正确理解您的打算,但是如果您需要其他输出,请在评论中告知我。

b = data.table(date = rep(seq.Date(Sys.Date()-99, Sys.Date(), "days"), each=2), 
               myCategory = c("a", "b"), 
               revenue = rnorm(100, 200))


# global total, just create a constant
totalRev = b[, sum(revenue)]

# Total revenue at myCategory and date level / total Revenue
b[, Revenue_prop_of_TOT:=sum(revenue)/totalRev, by=.(myCategory, date)]

# you can calculate totalRev_date independently
b[, totalRev_date:=sum(revenue), by=date]

# If these are all the columns you have you don't need the sum(revenue) and by calls
b[, Revenue_prop_of_DAY:=sum(revenue)/totalRev_date, by=.(myCategory, date)]

最后,我将其包装在一个函数中。

revenue_total <- function(b){ 
  totalRev = b[, sum(revenue)]
  b[, Revenue_prop_of_TOT:=sum(revenue)/totalRev, by=.(myCategory, date)]
  b[, totalRev_date:=sum(revenue), by=date]
  b[, Revenue_prop_of_DAY:=sum(revenue)/totalRev_date, by=.(myCategory, date)]
  b
}

b = revenue_total(b)

答案 1 :(得分:1)

另一个使用data.table::cube的选项:

cb <- cube(DT, sum(value), by=c("date","category"), id=TRUE)

cb[grouping==0L, .(date, category,

    PropByDate = V1 / cb[grouping==1L][.SD, on="date", x.V1],

    PropByCategory = V1 / cb[grouping==2L][.SD, on="category", x.V1],

    PropByTotal = V1 / cb[grouping==3L, V1]
)]

输出:

   date category PropByDate PropByCategory PropByTotal
1:    1        1  0.3333333      0.2500000         0.1
2:    1        2  0.6666667      0.3333333         0.2
3:    2        1  0.4285714      0.7500000         0.3
4:    2        2  0.5714286      0.6666667         0.4

数据:

DT <- data.table(date=c(1, 1, 2, 2), category=c(1, 2, 1, 2), value=1:4)

#   date category value
#1:    1        1     1
#2:    1        2     2
#3:    2        1     3
#4:    2        2     4

答案 2 :(得分:0)

R中的透视和小计选项

  1. 多维数据集回答here

  2. 由marbel here

    评论的
  3. 分组集