下面是我想要做的一个例子。 eval(substitute(*))
运行良好,如here所示,但使代码更难阅读。我想知道是否有更好的东西我不知道。
我希望能够选择表格的行和列变量(最后)。 所以,如果我有
input.col <- 'Gender'
input.row <- 'Region'
我希望能够将这些参数传递给数据表,而不是使用Region
和Gender
。
library(data.table)
library(reshape)
set.seed(5)
DT <- data.table(Region = sample(x = c('Asia', 'Americas', 'Africa', 'Europe', 'Oceania'), size = 200, replace = T), Weight = runif(n = 200, min = 1, max = 5), Age = round(x = 10*rexp(n = 200, rate = 1), digits = 0), Gender = sample(x = c('Male', 'Female', 'Gender diverse'), size = 200, replace = T, prob = c(0.49, 0.49, 0.02)))
cast(data = DT[, sum(Weight), .(Region, Gender)], formula = Region~Gender, fun.aggregate = sum, value = 'V1')
我想进入下表
Region Female Gender diverse Male
1 Africa 32.95019 3.222125 77.50863
2 Americas 49.12787 0.000000 84.97214
3 Asia 41.04879 0.000000 55.43294
4 Europe 45.39469 4.296767 47.76714
5 Oceania 65.89198 1.439075 72.27496
谢谢!
答案 0 :(得分:4)
以下是一些可能性。除(3)外,他们只使用data.table。所有方法都在一个操作中执行聚合和重新整形,因此首先不需要使用by
。如果您确实想要出于某种原因使用by
,那么这将起作用:
cast(data = DT[, sum(Weight), by = c(input.row, input.col)],
formula = paste(input.row, "~", input.col), fun.aggregate = sum, value = 'V1')
1)data.table :: dcast
dcast(DT, paste(input.row, "~", input.col), sum, value.var = "Weight")
,并提供:
Region Female Gender diverse Male
1: Africa 32.95019 3.222125 77.50863
2: Americas 49.12787 0.000000 84.97214
3: Asia 41.04879 0.000000 55.43294
4: Europe 45.39469 4.296767 47.76714
5: Oceania 65.89198 1.439075 72.27496
2)xtabs xtabs
位于R的基础上:
fo <- sprintf("Weight ~ %s + %s", input.row, input.col)
xtabs(fo, DT)
,并提供:
Gender
Region Female Gender diverse Male
Africa 32.950187 3.222125 77.508626
Americas 49.127873 0.000000 84.972137
Asia 41.048787 0.000000 55.432941
Europe 45.394693 4.296767 47.767138
Oceania 65.891983 1.439075 72.274955
3)reshape :: cast 我们将使用reshape包,因为问题确实存在,但请注意它已被reshape2包取代,而在reshape2中,会使用dcast
;但是,dcast
也在data.table中按照(1)实现。
cast(DT, paste(input.row, "~", input.col), sum, value = "Weight")
,并提供:
Region Female Gender diverse Male
1 Africa 32.95019 3.222125 77.50863
2 Americas 49.12787 0.000000 84.97214
3 Asia 41.04879 0.000000 55.43294
4 Europe 45.39469 4.296767 47.76714
5 Oceania 65.89198 1.439075 72.27496
4)tapply
tapply(DT$Weight, as.list(DT)[c(input.row, input.col)], sum, default = 0)
,并提供:
Gender
Region Female Gender diverse Male
Africa 32.95019 3.222125 77.50863
Americas 49.12787 0.000000 84.97214
Asia 41.04879 0.000000 55.43294
Europe 45.39469 4.296767 47.76714
Oceania 65.89198 1.439075 72.27496
答案 1 :(得分:3)
您可以使用get
,然后重命名可在公式中进一步使用的变量:
input.col <- 'Gender'
input.row <- 'Region'
dt <- cast(data = DT[, sum(Weight), .(row = get(input.row), col = get(input.col))],
# ^^^ ^^^ ^^^ ^^^
formula = row ~ col, fun.aggregate = sum, value = 'V1')
dt
# row Female Gender diverse Male
#1 Africa 32.95019 3.222125 77.50863
#2 Americas 49.12787 0.000000 84.97214
#3 Asia 41.04879 0.000000 55.43294
#4 Europe 45.39469 4.296767 47.76714
#5 Oceania 65.89198 1.439075 72.27496