Question

我有一个脚本，我正在使用ddply，如下例所示：

ddply(df, .(col),
function(x) data.frame(
col1=some_function(x$y),
col2=some_other_function(x$y)
)
)

在ddply中，是否可以重用col1而不再调用整个函数？

例如：

ddply(df, .(col),
function(x) data.frame(
col1=some_function(x$y),
col2=some_other_function(x$y)
col3=col1*col2
)
)

Answer 1

你有一个完整的功能可以玩！不必是一个单行！这应该有效：

ddply(df, .(col), function(x) {
  tmp <- some_other_function(x$y)
  data.frame(
    col1=some_function(x$y),
    col2=tmp,
    col3=tmp
  )
})

Answer 2

这似乎是data.table使用j组件的范围规则的一个很好的候选者。请参阅FAQ 2.8 for details。

来自FAQ

没有传递匿名函数 j。相反，匿名正文传递给j。

所以，对于你的情况

library(data.table)
DT <- as.data.table(df)
DT[,{
 col1=some_function(y)
 col2=some_other_function(y)
 col3= col1 *col2
 list(col1 = col1, col2 = col2, col3 = col3)
 }, by = col]

或稍微直接的方式：

DT[,list(
 col1=col1<-some_function(y)
 col2=col2<-some_other_function(y)
 col3=col1*col2
 ), by = col]

这避免了col1和col2的重复，并避免了col3的两次重复;重复是我们努力在data.table中减少的。 =后跟<-最初可能看起来很麻烦。但是，这允许以下语法糖：

DT[,list(
 "Projected return (%)"=      col1<-some_function(y),
 "Investment ($m)"=           col2<-some_other_function(y),
 "Return on Investment ($m)"= col1*col2
 ), by = col]

例如，输出可以直接发送到latex或html。

Answer 3

我不认为这是可能的，但它不应该太重要，因为那时它不再是聚合函数了。例如：

#use summarize() in ddply()
data.means <- ddply(data, .(groups), summarize, mean = mean(x), sd = sd(x), n = length(x))
data.means$se <- data.means$sd / sqrt(data.means$n)
data.means$Upper <- data.means$mean + (data.means$SE * 1.96)
data.means$Lower <- data.means$mean - (data.means$SE * 1.96)

所以我没有直接计算SE，但在ddply()之外计算它并不是那么糟糕。如果你真的想，你也可以做

ddply(data, .(groups), summarize, se = sd(x) / sqrt(length(x)))

或者根据你的例子来说明

ddply(df, .(col), summarize,
      col1=some_function(y),
      col2=some_other_function(y)
      col3=some_function(y)*some_other_function(y)
    )

是否可以在ddply中重用生成的列？

3 个答案: