Question

我有一个数据框，其中列名是根据参数生成的 - 所以我不知道它们的确切值。我想将这些字段也作为参数传递给ddply。我想答案显而易见，但有人可以为我开灯。

下面的示例使用虹膜数据集，它提供了我想要做的事情，以及我努力的意外结果。第一个例子的结果，iris1是我想要实现的，但是通过将列名称作为参数传递，就像我的iris2努力一样，这不会给我预期的结果。

iris1 <- ddply(iris, .(Species), transform, pw_first = Petal.Width[1], 
              pw_last = Petal.Width[length(Petal.Width)])
myCol <- 'Petal.Width'
iris2 <- ddply(iris, .(Species), transform, pw_first = myCol[1], 
               pw_last = myCol[length(myCol)])

head(iris1)
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species pw_first pw_last
# 1          5.1         3.5          1.4         0.2  setosa      0.2     0.2
# 2          4.9         3.0          1.4         0.2  setosa      0.2     0.2
# 3          4.7         3.2          1.3         0.2  setosa      0.2     0.2
# 4          4.6         3.1          1.5         0.2  setosa      0.2     0.2
# 5          5.0         3.6          1.4         0.2  setosa      0.2     0.2
# 6          5.4         3.9          1.7         0.4  setosa      0.2     0.2

head(iris2)
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species    pw_first     pw_last
# 1          5.1         3.5          1.4         0.2  setosa Petal.Width Petal.Width
# 2          4.9         3.0          1.4         0.2  setosa Petal.Width Petal.Width
# 3          4.7         3.2          1.3         0.2  setosa Petal.Width Petal.Width
# 4          4.6         3.1          1.5         0.2  setosa Petal.Width Petal.Width
# 5          5.0         3.6          1.4         0.2  setosa Petal.Width Petal.Width
# 6          5.4         3.9          1.7         0.4  setosa Petal.Width Petal.Width

Answer 1

colName<-"Petal.Width"

iris1 <- ddply(iris, .(Species), function (x) {
               pw.first=x[1,colName]
               pw.last=x[length(x[,1]),colName]
               result=cbind(x,pw.first,pw.last)
               return(result)})

unique(iris1$pw.first)
[1] 0.2 1.4 2.5

unique(iris1$pw.last)
[1] 0.2 1.3 1.8

如果你只想要物种，pw.first和pw.last，只需从cbind中删除x。

Answer 2

你走了。此解决方案中的想法是使用get，它在当前环境中查找变量。因此，get(myCol)会在正在操作的数据框中找到myCol。

myCol <- 'Petal.Width'
iris2 <- ddply(iris, .(Species), transform, 
  pw_first = get(myCol)[1],
  pw_last = get(myCol)[length(get(myCol))]
)

另一种方法，可能更容易理解

iris2 <- ddply(iris, .(Species), function(df){
  x = df[[myCol]]
  transform(df, pw_first = x[1], pw_last = x[length(x)])
})

Answer 3

还在学习R，但我发现ddply的Function界面适合我的大脑...也许这很接近？

iris1 <- ddply(iris, 
               .(Species), 
               function(x,y) {result = data.frame(x$Petal.Width[1],
                                                  x$Petal.Width[length(x$Petal.Width)])
                              names(result) <- y
                              return(result)},
               c('first','last'))
iris1

结果：

     Species first last
1     setosa   0.2  0.2
2 versicolor   1.4  1.3
3  virginica   2.5  1.8

或许这个？

iris1 <- ddply(iris, 
               .(Species), 
               function(x,y) {
                 result = cbind(x,x$Petal.Width[1],x$Petal.Width[length(x$Petal.Width)])
                 names(result) = c(names(x),y)
                 return(result)
                 },
               c('first','last'))
head(iris1)

结果：

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species first last
1          5.1         3.5          1.4         0.2  setosa   0.2  0.2
2          4.9         3.0          1.4         0.2  setosa   0.2  0.2
3          4.7         3.2          1.3         0.2  setosa   0.2  0.2
4          4.6         3.1          1.5         0.2  setosa   0.2  0.2
5          5.0         3.6          1.4         0.2  setosa   0.2  0.2
6          5.4         3.9          1.7         0.4  setosa   0.2  0.2

好的，现在更有意义。将data.frame的现有列作为参数传递，然后使用参数列作为计算源，为data.frame生成两个添加的列。怎么样：

iris1 <- ddply(iris, 
               .(Species), 
               function(x,y) {
                 len <- length(x[,1])
                 first <- x[1,y]
                 last <- x[len,y]
                 result <- cbind(x,first,last)
                 names(result) <- c(names(x),'first','last')
                 return(result)
               },
               'Petal.Width'
)
head(iris1)

结果：

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species first last
1          5.1         3.5          1.4         0.2  setosa   0.2  0.2
2          4.9         3.0          1.4         0.2  setosa   0.2  0.2
3          4.7         3.2          1.3         0.2  setosa   0.2  0.2
4          4.6         3.1          1.5         0.2  setosa   0.2  0.2
5          5.0         3.6          1.4         0.2  setosa   0.2  0.2
6          5.4         3.9          1.7         0.4  setosa   0.2  0.2

我希望你会做一些“第一次”和“最后一次”以外的事情 - 比如mean或sd这个功能。第一个和最后一个依赖于ddply函数，以已知的顺序给出匿名函数数据......我不确定它是否存在。你可能得到不同的，意想不到的答案。

ddply：如何将列名作为参数传递？

3 个答案: