将用户输入传递给data.table中的'by'和reshape-r中的公式

时间:2017-07-23 23:11:18

标签: r data.table reshape

下面是我想要做的一个例子。 eval(substitute(*))运行良好,如here所示,但使代码更难阅读。我想知道是否有更好的东西我不知道。

我希望能够选择表格的行和列变量(最后)。 所以,如果我有

input.col <- 'Gender'
input.row <- 'Region'

我希望能够将这些参数传递给数据表,而不是使用RegionGender

library(data.table)
library(reshape)
set.seed(5)
DT <- data.table(Region = sample(x = c('Asia', 'Americas', 'Africa', 'Europe', 'Oceania'), size = 200, replace = T), Weight = runif(n = 200, min = 1, max = 5), Age = round(x = 10*rexp(n = 200, rate = 1), digits = 0), Gender = sample(x = c('Male', 'Female', 'Gender diverse'), size = 200, replace = T, prob = c(0.49, 0.49, 0.02)))
cast(data = DT[, sum(Weight), .(Region, Gender)], formula = Region~Gender, fun.aggregate = sum, value = 'V1')

我想进入下表

Region   Female Gender diverse     Male
1   Africa 32.95019       3.222125 77.50863
2 Americas 49.12787       0.000000 84.97214
3     Asia 41.04879       0.000000 55.43294
4   Europe 45.39469       4.296767 47.76714
5  Oceania 65.89198       1.439075 72.27496

谢谢!

2 个答案:

答案 0 :(得分:4)

以下是一些可能性。除(3)外,他们只使用data.table。所有方法都在一个操作中执行聚合和重新整形,因此首先不需要使用by。如果您确实想要出于某种原因使用by,那么这将起作用:

cast(data = DT[, sum(Weight), by = c(input.row, input.col)], 
     formula = paste(input.row, "~", input.col), fun.aggregate = sum, value = 'V1')

1)data.table :: dcast

dcast(DT, paste(input.row, "~", input.col), sum, value.var = "Weight")

,并提供:

     Region   Female Gender diverse     Male
1:   Africa 32.95019       3.222125 77.50863
2: Americas 49.12787       0.000000 84.97214
3:     Asia 41.04879       0.000000 55.43294
4:   Europe 45.39469       4.296767 47.76714
5:  Oceania 65.89198       1.439075 72.27496

2)xtabs xtabs位于R的基础上:

fo <- sprintf("Weight ~ %s + %s", input.row, input.col)
xtabs(fo, DT)

,并提供:

          Gender
Region        Female Gender diverse      Male
  Africa   32.950187       3.222125 77.508626
  Americas 49.127873       0.000000 84.972137
  Asia     41.048787       0.000000 55.432941
  Europe   45.394693       4.296767 47.767138
  Oceania  65.891983       1.439075 72.274955

3)reshape :: cast 我们将使用reshape包,因为问题确实存在,但请注意它已被reshape2包取代,而在reshape2中,会使用dcast;但是,dcast也在data.table中按照(1)实现。

cast(DT, paste(input.row, "~", input.col), sum, value = "Weight")

,并提供:

    Region   Female Gender diverse     Male
1   Africa 32.95019       3.222125 77.50863
2 Americas 49.12787       0.000000 84.97214
3     Asia 41.04879       0.000000 55.43294
4   Europe 45.39469       4.296767 47.76714
5  Oceania 65.89198       1.439075 72.27496

4)tapply

tapply(DT$Weight, as.list(DT)[c(input.row, input.col)], sum, default = 0)

,并提供:

          Gender
Region       Female Gender diverse     Male
  Africa   32.95019       3.222125 77.50863
  Americas 49.12787       0.000000 84.97214
  Asia     41.04879       0.000000 55.43294
  Europe   45.39469       4.296767 47.76714
  Oceania  65.89198       1.439075 72.27496

答案 1 :(得分:3)

您可以使用get,然后重命名可在公式中进一步使用的变量:

input.col <- 'Gender'
input.row <- 'Region'

dt <- cast(data = DT[, sum(Weight), .(row = get(input.row), col = get(input.col))], 
#                                     ^^^   ^^^             ^^^   ^^^  
           formula = row ~ col, fun.aggregate = sum, value = 'V1')

dt
#       row   Female Gender diverse     Male
#1   Africa 32.95019       3.222125 77.50863
#2 Americas 49.12787       0.000000 84.97214
#3     Asia 41.04879       0.000000 55.43294
#4   Europe 45.39469       4.296767 47.76714
#5  Oceania 65.89198       1.439075 72.27496