R.如何按条件计算行数并放入新列

时间:2014-06-29 10:44:13

标签: r

我一直在试图解决R中看似简单的问题。鉴于数据集叫做“数据”。下面我想计算DATA $ ID列的事件数,然后将它们放入一个新列。因此,例如新DATA $ NEW的第一个条目将是19,因为ID出现19次。我似乎无法解开如何做到这一点。

PRAY NOTES        ID DURATION
    1    NA  <NA> 1_MENS_10       60
    2    NA  <NA> 1_MENS_10       60
    3    NA  <NA> 1_MENS_10       60
    4    NA  <NA> 1_MENS_10       60
    5    NA  <NA> 1_MENS_10       60
    6    NA  <NA> 1_MENS_10       60
    7    NA  <NA> 1_MENS_10       60
    8    NA  <NA> 1_MENS_10       60
    9    NA  <NA> 1_MENS_10       60
    10   NA  <NA> 1_MENS_10       60
    11   NA  <NA> 1_MENS_10       60
    12   NA  <NA> 1_MENS_10       60
    13   NA  <NA> 1_MENS_10       60
    14   NA  <NA> 1_MENS_10       60
    15   NA  <NA> 1_MENS_10       60
    16   NA  <NA> 1_MENS_10       60
    17   NA  <NA> 1_MENS_10       60
    18   NA  <NA> 1_MENS_10       60
    19   NA  <NA> 1_MENS_10       60
    20    2  <NA> 1_MENS_14       61
    21    3  <NA> 1_MENS_14       61
    22    2  <NA> 1_MENS_14       61
    23    1  <NA> 1_MENS_14       61
    24    1  <NA> 1_MENS_14       61
    25    3  <NA> 1_MENS_14       61
    26    2  <NA> 1_MENS_14       61
    27    3  <NA> 1_MENS_14       61
    28    1  <NA> 1_MENS_14       61
    29    3  <NA> 1_MENS_14       61
    30    3  <NA> 1_MENS_14       61

这里是dput

structure(list(PRAY = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, 2L, 3L, 2L, 1L, 1L, 3L, 2L, 
3L, 1L, 3L, 3L), NOTES = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "<NA>", class = "factor"), 
    ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L), .Label = c("1_MENS_10", "1_MENS_14"), class = "factor"), 
    DURATION = c(60L, 60L, 60L, 60L, 60L, 60L, 60L, 60L, 60L, 
    60L, 60L, 60L, 60L, 60L, 60L, 60L, 60L, 60L, 60L, 61L, 61L, 
    61L, 61L, 61L, 61L, 61L, 61L, 61L, 61L, 61L), NEW = c(19L, 
    19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 
    19L, 19L, 19L, 19L, 19L, 19L, 11L, 11L, 11L, 11L, 11L, 11L, 
    11L, 11L, 11L, 11L, 11L)), .Names = c("PRAY", "NOTES", "ID", 
"DURATION", "NEW"), row.names = c(NA, -30L), class = "data.frame")

3 个答案:

答案 0 :(得分:3)

使用data.table包:

library(data.table)
setDT(DATA)[, NEW := .N, by = ID]

DATA
##    PRAY NOTES        ID DURATION NEW
## 1:   NA  <NA> 1_MENS_10       60  19
## 2:   NA  <NA> 1_MENS_10       60  19
## 3:   NA  <NA> 1_MENS_10       60  19
## 4:   NA  <NA> 1_MENS_10       60  19
## 5:   NA  <NA> 1_MENS_10       60  19
## 6:   NA  <NA> 1_MENS_10       60  19
## 7:   NA  <NA> 1_MENS_10       60  19
....

setDT通过引用将data.frame转换为data.table (意味着,没有制作数据副本)因此非常快。然后,我们按ID 汇总并使用NEW添加一个新列.N,其中包含 组的计数建立特殊变量。

  

注意:在v1.9.3中,setDF功能现已导出,您可以再次使用参考返回data.frame。因此,如果您因某种原因想要坚持data.frame,您可以对结果执行:setDF(.)

答案 1 :(得分:2)

plyr可以轻松完成,让您坚持使用数据框:

library(plyr)
dat <- ddply(dat, .(ID), transform, NEW=length(ID))

答案 2 :(得分:1)

这里有dplyr等效物,以完成设置:

library(dplyr)

DATA <- DATA %>% group_by(ID) %>% mutate(ID_Counts = n())

head(DATA)
#Source: local data frame [6 x 6]
#Groups: ID
#
#  PRAY NOTES        ID DURATION NEW ID_Counts
#1   NA  <NA> 1_MENS_10       60  19        19
#2   NA  <NA> 1_MENS_10       60  19        19
#3   NA  <NA> 1_MENS_10       60  19        19
#4   NA  <NA> 1_MENS_10       60  19        19
#5   NA  <NA> 1_MENS_10       60  19        19
#6   NA  <NA> 1_MENS_10       60  19        19