按组的文本获取应用函数的变量

时间:2017-02-21 14:27:11

标签: r dataframe apply

我正在尝试通过“类别”组将缩放功能应用于data.frame。缩放功能需要一个特定的标量,具体取决于它的类别。计算结果是:'t'的每个值除以每个'cat'的't'的总和,然后乘以匹配'cat'的标量(即cat的'fac.a'= a等)

我使用'get'来调用特定的标量,但它只使用第1行的值并应用于整个data.frame:

# my scaling factors:
fac.a <- 15
fac.b <- 12
fac.c <- 20

# dummy data.frame
set.seed(10)
df <- data.frame(t = sample(1:100,15),cat = rep(c("a","b","c"),each=5))

# apply function that groups & sums the df$t values by df$cat, divides each df$t by its 
# repsective category total and applies the correct scalar with a get function.
df$scaled <- apply(df[1], 2, function(x) (df$t/ave(df$t, df$cat, FUN=sum))*get(paste0("fac.",df$cat)) )

不幸的是,我只得到第一类的正确答案,因为get函数只调用第一个标量。

这可以在4到5行中相对容易地完成(构建单独的属性等),但我想在应用函数中实现它。

N.B。为什么在data.frame中新属性称为“t”,但在检查名称(df)时“缩放”?

1 个答案:

答案 0 :(得分:1)

我们可以使用data.table

library(data.table)
setDT(df)[, newt := sum(t), cat][, 
   scaled := (t/newt) * get(paste0('fac.', cat)), 1:nrow(df)][, newt := NULL][]
#     t cat    scaled
# 1:  51   a 3.8059701
# 2:  31   a 2.3134328
# 3:  42   a 3.1343284
# 4:  68   a 5.0746269
# 5:   9   a 0.6716418
# 6:  22   b 1.1046025
# 7:  26   b 1.3054393
# 8:  94   b 4.7196653
# 9:  57   b 2.8619247
#10:  40   b 2.0083682
#11:  59   c 3.6875000
#12: 100   c 6.2500000
#13:  10   c 0.6250000
#14:  52   c 3.2500000
#15:  99   c 6.1875000

或者更快的选择是创建键/值数据集,与原始数据连接以创建缩放&#39;柱

df2 <- setnames(setDT(stack(mget(ls(pattern="fac\\.")))[2:1]),
                      1, "cat")[, cat := sub(".*\\.", "", cat)][]
setDT(df)[df2, scaled := (t/sum(t))*values, on = .(cat), by = .EACHI]
df
#      t cat    scaled
# 1:  51   a 3.8059701
# 2:  31   a 2.3134328
# 3:  42   a 3.1343284
# 4:  68   a 5.0746269
# 5:   9   a 0.6716418
# 6:  22   b 1.1046025
# 7:  26   b 1.3054393
# 8:  94   b 4.7196653
# 9:  57   b 2.8619247
#10:  40   b 2.0083682
#11:  59   c 3.6875000
#12: 100   c 6.2500000
#13:  10   c 0.6250000
#14:  52   c 3.2500000
#15:  99   c 6.1875000