我不是R初学者,但我真的很难解决我的问题。我的问题是:我有一个数据框(这是一个例子)。
id name dateA
1 A 150
1 A 160
2 B 110
2 B 1009
2 B 098
2 B 309
3 C 218
3 C 310
4 D 219
我想创建3个新列(minA,maxA,repA)
minA == min(of dateA for each id)
maxA == max(of dateA for each id)
repA == number of repetition for each id;
id name dateA minA maxA repA
1 A 150
1 A 160
2 B 110
2 B 1009
2 B 098
2 B 309
3 C 218
3 C 310
4 D 219
感谢您的帮助。希望我足够清楚。
答案 0 :(得分:4)
你可以尝试
library(data.table)#v1.9.5+
setDT(df1)[,c('minA', 'maxA', 'repA') := list(min(dateA), max(dateA),
.N) , by= id]
对于更新后的数据集,我们会创建列' minA',' maxA',' repA'和之前一样。通过(:=
)分配min(dateA)
,max(dateA)
和.N
按“ID&ID”分组。将键列设置为' id' (setkey(.., id)
),加入从重塑' long'获得的输出。广泛的'格式(dcast(df2, ..)
)
setkey(setDT(df2)[, c('minA', 'maxA', 'repA') := list(min(dateA),
max(dateA), .N) , by= id], id)[
dcast(df2, id~typeP, value.var='typeP', length)]
# id name dateA typeP minA maxA repA P1 P2 P3
#1: 1 A 150 P1 150 160 2 2 0 0
#2: 1 A 160 P1 150 160 2 2 0 0
#3: 2 B 110 P2 98 1009 4 1 3 0
#4: 2 B 1009 P2 98 1009 4 1 3 0
#5: 2 B 98 P1 98 1009 4 1 3 0
#6: 2 B 309 P2 98 1009 4 1 3 0
#7: 3 C 218 P2 218 310 2 0 1 1
#8: 3 C 310 P3 218 310 2 0 1 1
#9: 4 D 219 P1 219 219 1 1 0 0
df1 <- structure(list(id = c(1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 4L),
name = c("A",
"A", "B", "B", "B", "B", "C", "C", "D"), dateA = c(150L, 160L,
110L, 1009L, 98L, 309L, 218L, 310L, 219L)), .Names = c("id",
"name", "dateA"), class = "data.frame", row.names = c(NA, -9L))
df2 <- structure(list(id = c(1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 4L),
name = c("A",
"A", "B", "B", "B", "B", "C", "C", "D"), dateA = c(150L, 160L,
110L, 1009L, 98L, 309L, 218L, 310L, 219L), typeP = c("P1", "P1",
"P2", "P2", "P1", "P2", "P2", "P3", "P1")), .Names = c("id",
"name", "dateA", "typeP"), class = "data.frame",
row.names = c(NA, -9L))
答案 1 :(得分:2)
使用dplyr
require(dplyr)
Data <- Data %>%
group_by(id) %>%
mutate(minA = min(dateA), maxA = max(dateA), repA = n())
给予
> Data
Source: local data frame [9 x 6]
Groups: id
id name dateA minA maxA repA
1 1 A 150 150 160 2
2 1 A 160 150 160 2
3 2 B 110 98 1009 4
4 2 B 1009 98 1009 4
5 2 B 98 98 1009 4
6 2 B 309 98 1009 4
7 3 C 218 218 310 2
8 3 C 310 218 310 2
9 4 D 219 219 219 1
答案 2 :(得分:1)
您可以按如下方式使用data.table
:
setDT(dat)
setkey(dat, id) #this makes the last line join on id
agg_dat <- dat[,.(minA = min(dateA), maxA = max(dateA), repA = .N), by = id]
dat[agg_dat]
其中agg_dat
包含聚合数据,dat[agg_dat]
通过ID将聚合数据加入数据集。