Question

我正在尝试通过名为group by的{{1}}函数在R中执行等效的plyr摘要。我的数据框有三列（比如ddply，id和period）。然后，我想计算每个event在数据框中显示的时间（id和count(*)... group by id），并获取与列对应的每个SQL的最后一个元素id。

这是我所拥有的以及我想要获得的一个例子：

event

这是我一直在使用的简单代码：

  id period event #original data frame
  1      1     1
  2      1     0
  2      2     1
  3      1     1
  4      1     1
  4      1     0

  id  t  x #what I want to obtain
  1   1  1
  2   2  1
  3   1  1
  4   2  0

现在，我一直在阅读The Split-Apply-Combine Strategy for Data Analysis，并给出了一个例子，他们使用了与我下面的语法相同的语法：

 teachers.pp<-read.table("http://www.ats.ucla.edu/stat/examples/alda/teachers_pp.csv", sep=",", header=T) # whole data frame
 datos=ddply(teachers.pp,.(id),function(x) c(t=length(x$id), x=x[length(x$id),3])) #This is working fine.

这是我使用datos2=ddply(teachers.pp,.(id), summarise, t=length(id), x=teachers.pp[length(id),3]) #using summarise but the result is not what I want.

获取的数据框

datos2

所以，我的问题是：为什么这个结果与我使用第一段代码的结果不同，我的意思是id t x 1 1 1 2 2 0 3 1 1 4 1 1？我做错了什么？

当我必须使用datos1或summarise时，我不清楚。你能告诉我transform函数的正确语法吗？

Answer 1

使用summarise时，请停止引用原始数据框。相反，只需根据列名称编写表达式。

你试过这个：

ddply(teachers.pp,.(id), summarise, t=length(id), x=teachers.pp[length(id),3])

当你想要的东西更像是这样的东西时：

ddply(teachers.pp,.(id), summarise, t=length(id), x=tail(event,1))

怀疑R中的ddply函数

1 个答案: