将部分累加和列添加到数据框

时间:2015-01-05 09:52:43

标签: r dataframe

我的df按产品ID prodID和日期Date排序。我需要添加一个列,显示每个prodID在df中的累计指数。例如:如果proID仅出现一次,则该行的索引将为1。如果另外prodID出现在3行中(在df中是连续的,对于df是排序的),那么索引应该在第一行中为1,非常prodID,然后是2,然后是3在以下行中。 基本上我需要我最初的df:

 initial.df <- structure(list(prodID = c("009hpOpzwl", "00An0zNeEQ", "00An0zNeEQ", "00An0zNeEQ", "00An0zNeEQ", "00DtU3Bk6O", "00DtU3Bk6O", "00FyjrH1kk", "00FyjrH1kk", "00FyjrH1kk", "00FyjrH1kk", "00FyjrH1kk"), Date = c("2012-06", "2014-09", "2014-09", "2014-09", "2014-09", "2001-11", "2001-11", "2002-11", "2002-12", "2003-01", "2003-02", "2003-03"), status = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 5L, 5L, 5L, 5L, 5L), .Label = c("rare", "occasional", "amateur", "connoisseur", "expert", "fool"), class = "factor"),     rating = c(2.5, 4.7, 4.7, 4.7, 4.7, 4.4, 4.4, 3.5, 3.83,     3.36, 3.53, 3.78), over = c(68, 49, 49, 49, 49, 22, 22, 29,     38.33, 43.3, 39.53, 30.58)), class = "data.frame", row.names = c(NA, -12L), .Names = c("prodID", "Date", "status", "rating", "over"))

变为

new.df <- structure(list(prodID = c("009hpOpzwl", "00An0zNeEQ", "00An0zNeEQ", "00An0zNeEQ", "00An0zNeEQ", "00DtU3Bk6O", "00DtU3Bk6O", "00FyjrH1kk", "00FyjrH1kk", "00FyjrH1kk", "00FyjrH1kk", "00FyjrH1kk"), Date = c("2012-06", "2014-09", "2014-09", "2014-09", "2014-09", "2001-11", "2001-11", "2002-11", "2002-12", "2003-01", "2003-02", "2003-03"), status = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 5L, 5L, 5L, 5L, 5L), .Label = c("rare", "occasional", "amateur", "connoisseur", "expert", "fool"), class = "factor"),     rating = c(2.5, 4.7, 4.7, 4.7, 4.7, 4.4, 4.4, 3.5, 3.83,     3.36, 3.53, 3.78), over = c(68, 49, 49, 49, 49, 22, 22, 29,     38.33, 43.3, 39.53, 30.58), index = c(1, 1, 2, 3, 4, 1, 2,     1, 2, 3, 4, 5)), .Names = c("prodID", "Date", "status", "rating", "over", "index"), row.names = c(NA, -12L), class = "data.frame")

提前感谢您提出任何建议

4 个答案:

答案 0 :(得分:3)

如果保证按照您声明的方式对数据进行排序,您可以使用ave功能实现此目的:

initial.df$index <- ave(initial.df$prodID, initial.df$prodID, FUN=function(x) seq(along=x))

答案 1 :(得分:3)

为了完整起见,使用data.table时这是非常简单的操作,并且既有效又简短的语法并且通过引用创建列,只需:

library(data.table)
setDT(initial.df)[, index := seq_len(.N), prodID]

答案 2 :(得分:2)

如果这个问题没有被其他人复制,我们已经有了data.table答案,这里是dplyr版本:

library(dplyr)
df %>% group_by(prodID) %>% mutate(index = row_number())

答案 3 :(得分:1)

怎么样?
do.call(rbind, lapply(split(initial.df, initial.df$prodID), function(x) cbind(x, 1:nrow(x))))