Question

我正在使用R，我有一个数据框，其中包含有关个人为资助申请的信息。个人可以根据自己的喜好多次申请补助金。我想派生一个新变量，告诉我每个人有多少个应用程序，包括每个记录所代表的应用程序的日期。

目前，我的数据如下：

app number  date app made     applicant
1           2012-08-01        John
2           2012-08-02        John
3           2012-08-02        Jane
4           2012-08-04        John
5           2012-08-08        Alice
6           2012-08-09        Alice
7           2012-08-09        Jane

我想添加另一个变量，所以我的数据框看起来像这样：

app number  date app made    applicant  applications by applicant to date
1           2012-08-01       John       1
2           2012-08-02       John       2
3           2012-08-02       Jane       1
4           2012-08-04       John       3
5           2012-08-08       Alice      1
6           2012-08-09       Alice      2
7           2012-08-09       Jane       2

我是R的新手，我真的很想弄清楚如何做到这一点。我能得到的最接近的就是这个问题的答案： How do I count the number of observations at given intervals in R?

但我无法根据每条记录中的日期而不是预先设定的时间间隔来解决这个问题。

Answer 1

您可以使用plyr。如果您的数据位于data.frame数据中，我会添加一个名为count的列，然后使用cumsum

library(plyr)
dat <- structure(list(number = 1:7, date = c("2012-08-01", "2012-08-02", 
"2012-08-02", "2012-08-04", "2012-08-08", "2012-08-09", "2012-08-09"
), name = c("John", "John", "Jane", "John", "Alice", "Alice", 
"Jane")), .Names = c("number", "date", "name"), row.names = c(NA, 
-7L), class = "data.frame")

dat$count <- 1

ddply(dat, .(name), transform, count=cumsum(count))

  number       date  name count
1      5 2012-08-08 Alice     1
2      6 2012-08-09 Alice     2
3      3 2012-08-02  Jane     1
4      7 2012-08-09  Jane     2
5      1 2012-08-01  John     1
6      2 2012-08-02  John     2
7      4 2012-08-04  John     3
>

我认为您的日期已经排序，但是您可能希望在进行“计数”之前明确对它们进行排序：

dat <- dat[order(dat$date),]

根据评论，如果您理解（我没有！）transform的工作方式，这可以简化：

ddply(dat, .(name), transform, count=order(date))
  number       date  name count
1      5 2012-08-08 Alice     1
2      6 2012-08-09 Alice     2
3      3 2012-08-02  Jane     1
4      7 2012-08-09  Jane     2
5      1 2012-08-01  John     1
6      2 2012-08-02  John     2
7      4 2012-08-04  John     3

Answer 2

这是一种不如@Justin的优雅方式：

    A <- read.table(text='"app number"  "date app made"     "applicant"
    1           2012-08-01        John
    2           2012-08-02        John
    3           2012-08-02        Jane
    4           2012-08-04        John
    5           2012-08-08        Alice
    6           2012-08-09        Alice
    7           2012-08-09        Jane',header=TRUE)

    # order by applicant name
    A <- A[order(A$applicant), ]
    # get vector you're looking for
    A$app2date <- unlist(sapply(unique(A$applicant),function(x, appl){
                         seq(sum(A$applicant == x))
                       }, appl = A$applicant)
                     )
    # back in original order:
    A   <- A[order(A$"app.number"), ]

Answer 3

这是使用ave函数的1行方法。此版本不需要重新排序数据，但保留数据的顺序与原来相同：

A$applications <- ave(A$app.number, A$applicant, FUN=seq_along)

如何在R中导出一个变量，显示在早期日期记录的具有相同值的观测数量？

3 个答案: