值的出现的累积序列

时间:2013-03-05 17:37:16

标签: r sequence

我有一个看起来像这样的数据集,其中一列可以有四个不同的值:

dataset <- data.frame(out = c("a","b","c","a","d","b","c","a","d","b","c","a"))

在R中,我想创建第二列,按顺序计算包含特定值的累计行数。因此输出列将如下所示:

out
1
1
1
2
1
2
2
3
2
3
3
4

2 个答案:

答案 0 :(得分:14)

试试这个:

dataset <- data.frame(out = c("a","b","c","a","d","b","c","a","d","b","c","a"))
with(dataset, ave(as.character(out), out, FUN = seq_along))
# [1] "1" "1" "1" "2" "1" "2" "2" "3" "2" "3" "3" "4"

当然,您可以使用data.frame

之类的内容将输出分配到out$asNumbers <- with(dataset, ave(as.character(out), out, FUN = seq_along))中的列

更新

“dplyr”方法也很不错。逻辑与“data.table”方法非常相似。一个优点是您不需要使用上面提到的as.numeric方法所需的ave来包装输出。

dataset %>% group_by(out) %>% mutate(count = sequence(n()))
# Source: local data frame [12 x 2]
# Groups: out
# 
#    out count
# 1    a     1
# 2    b     1
# 3    c     1
# 4    a     2
# 5    d     1
# 6    b     2
# 7    c     2
# 8    a     3
# 9    d     2
# 10   b     3
# 11   c     3
# 12   a     4

第三种选择是使用我的“splitstackshape”包中的getanID。对于这个特定的例子,你只需要指定data.frame名称(因为它是一个列),但是,通常,你会更具体,并提到目前作为“ids”的列,以及该函数将检查它们是否是唯一的,或者是否需要累积序列来使它们唯一。

library(splitstackshape)
# getanID(dataset, "out")  ## Example of being specific about column to use
getanID(dataset)
#     out .id
#  1:   a   1
#  2:   b   1
#  3:   c   1
#  4:   a   2
#  5:   d   1
#  6:   b   2
#  7:   c   2
#  8:   a   3
#  9:   d   2
# 10:   b   3
# 11:   c   3
# 12:   a   4

答案 1 :(得分:7)

更新

正如阿南达指出的那样,你可以使用更简单的方法:

 DT[, counts := sequence(.N), by = "V1"]

DT如下所示)


您可以创建一个“计数”列,初始化为1,然后按因子计算累积总和。 以下是data.table

的快速实施
# Called the column V1
dataset<-data.frame(V1=c("a","b","c","a","d","b","c","a","d","b","c","a"))

library(data.table)

DT <- data.table(dataset)

DT[, counts := 1L]
DT[, counts := cumsum(counts), by=V1]; DT

#     V1 counts
#  1:  a      1
#  2:  b      1
#  3:  c      1
#  4:  a      2
#  5:  d      1
#  6:  b      2
#  7:  c      2
#  8:  a      3
#  9:  d      2
# 10:  b      3
# 11:  c      3
# 12:  a      4