我一直在试图解决R中看似简单的问题。鉴于数据集叫做“数据”。下面我想计算DATA $ ID列的事件数,然后将它们放入一个新列。因此,例如新DATA $ NEW的第一个条目将是19,因为ID出现19次。我似乎无法解开如何做到这一点。
PRAY NOTES ID DURATION
1 NA <NA> 1_MENS_10 60
2 NA <NA> 1_MENS_10 60
3 NA <NA> 1_MENS_10 60
4 NA <NA> 1_MENS_10 60
5 NA <NA> 1_MENS_10 60
6 NA <NA> 1_MENS_10 60
7 NA <NA> 1_MENS_10 60
8 NA <NA> 1_MENS_10 60
9 NA <NA> 1_MENS_10 60
10 NA <NA> 1_MENS_10 60
11 NA <NA> 1_MENS_10 60
12 NA <NA> 1_MENS_10 60
13 NA <NA> 1_MENS_10 60
14 NA <NA> 1_MENS_10 60
15 NA <NA> 1_MENS_10 60
16 NA <NA> 1_MENS_10 60
17 NA <NA> 1_MENS_10 60
18 NA <NA> 1_MENS_10 60
19 NA <NA> 1_MENS_10 60
20 2 <NA> 1_MENS_14 61
21 3 <NA> 1_MENS_14 61
22 2 <NA> 1_MENS_14 61
23 1 <NA> 1_MENS_14 61
24 1 <NA> 1_MENS_14 61
25 3 <NA> 1_MENS_14 61
26 2 <NA> 1_MENS_14 61
27 3 <NA> 1_MENS_14 61
28 1 <NA> 1_MENS_14 61
29 3 <NA> 1_MENS_14 61
30 3 <NA> 1_MENS_14 61
这里是dput
:
structure(list(PRAY = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 2L, 3L, 2L, 1L, 1L, 3L, 2L,
3L, 1L, 3L, 3L), NOTES = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "<NA>", class = "factor"),
ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L), .Label = c("1_MENS_10", "1_MENS_14"), class = "factor"),
DURATION = c(60L, 60L, 60L, 60L, 60L, 60L, 60L, 60L, 60L,
60L, 60L, 60L, 60L, 60L, 60L, 60L, 60L, 60L, 60L, 61L, 61L,
61L, 61L, 61L, 61L, 61L, 61L, 61L, 61L, 61L), NEW = c(19L,
19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L,
19L, 19L, 19L, 19L, 19L, 19L, 11L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L)), .Names = c("PRAY", "NOTES", "ID",
"DURATION", "NEW"), row.names = c(NA, -30L), class = "data.frame")
答案 0 :(得分:3)
使用data.table
包:
library(data.table)
setDT(DATA)[, NEW := .N, by = ID]
DATA
## PRAY NOTES ID DURATION NEW
## 1: NA <NA> 1_MENS_10 60 19
## 2: NA <NA> 1_MENS_10 60 19
## 3: NA <NA> 1_MENS_10 60 19
## 4: NA <NA> 1_MENS_10 60 19
## 5: NA <NA> 1_MENS_10 60 19
## 6: NA <NA> 1_MENS_10 60 19
## 7: NA <NA> 1_MENS_10 60 19
....
setDT
通过引用将data.frame
转换为data.table
(意味着,没有制作数据副本)因此非常快。然后,我们按ID
汇总并使用NEW
添加一个新列.N
,其中包含 组的计数建立特殊变量。
注意:在v1.9.3中,
setDF
功能现已导出,您可以再次使用参考返回data.frame
。因此,如果您因某种原因想要坚持data.frame
,您可以对结果执行:setDF(.)
。
答案 1 :(得分:2)
plyr
可以轻松完成,让您坚持使用数据框:
library(plyr)
dat <- ddply(dat, .(ID), transform, NEW=length(ID))
答案 2 :(得分:1)
这里有dplyr
等效物,以完成设置:
library(dplyr)
DATA <- DATA %>% group_by(ID) %>% mutate(ID_Counts = n())
head(DATA)
#Source: local data frame [6 x 6]
#Groups: ID
#
# PRAY NOTES ID DURATION NEW ID_Counts
#1 NA <NA> 1_MENS_10 60 19 19
#2 NA <NA> 1_MENS_10 60 19 19
#3 NA <NA> 1_MENS_10 60 19 19
#4 NA <NA> 1_MENS_10 60 19 19
#5 NA <NA> 1_MENS_10 60 19 19
#6 NA <NA> 1_MENS_10 60 19 19