在以下数据集中,我想按时间分组,然后按prod_id分组,并在组级别创建一个1:5的组变量。
例如,在期望的结果上,您可以看到time == 1 prod_id ==“ shoe”是第一组,然后time == 1 prod_id ==“ bird”是第二组,依此类推。 == 2它使序列继续进行,而不是从1:5重新开始。例如,在所需的输出行号8 time == 2 prod_id ==“ boat”和group == 3而不是1。
test <- data.frame('prod_id'= c("shoe", "shoe", "shoe", "bird", "bird", "bird",
"boat", "boat","boat","boat","boat","boat",
"bird", "bird", "bird", "fish", "fish", "fish",
"dog", "dog", "dog","cow", "cow", "cow",
"cat", "cat", "cat", "shoe", "shoe", "shoe",
"dog", "dog", "dog", "cat", "cat", "cat",
"fish", "fish", "fish", "cow", "cow", "cow"),
'time' = c(1, 1, 1, 1, 1, 3,
1, 2, 2, 1, 2, 2,
1, 3, 3, 4, 4, 1,
1, 2, 3, 4, 5, 6,
1, 2, 3, 1, 1, 1,
1, 2, 3, 4, 5, 6,
1, 1, 3, 1, 1, 6))
test
prod_id time
1 shoe 1
2 shoe 1
3 shoe 1
4 bird 1
5 bird 1
6 bird 3
7 boat 1
8 boat 2
9 boat 2
10 boat 1
11 boat 2
12 boat 2
13 bird 1
14 bird 3
15 bird 3
16 fish 4
17 fish 4
18 fish 1
19 dog 1
20 dog 2
21 dog 3
22 cow 4
23 cow 5
24 cow 6
25 cat 1
26 cat 2
27 cat 3
28 shoe 1
29 shoe 1
30 shoe 1
31 dog 1
32 dog 2
33 dog 3
34 cat 4
35 cat 5
36 cat 6
37 fish 1
38 fish 1
39 fish 3
40 cow 1
41 cow 1
42 cow 6
所需的输出:
prod_id time group
1 shoe 1 1
2 shoe 1 1
3 shoe 1 1
4 bird 1 2
5 bird 1 2
6 bird 3 1
7 boat 1 3
8 boat 2 3 *
9 boat 2 3
10 boat 1 3
11 boat 2 3
12 boat 2 3
13 bird 1 2
14 bird 3 1
15 bird 3 1
16 fish 4 4
17 fish 4 4
18 fish 1 4
19 dog 1 5
20 dog 2 4
21 dog 3 2
22 cow 4 5
23 cow 5 2
24 cow 6 4
25 cat 1 1 *
26 cat 2 5
27 cat 3 3
28 shoe 1 1
29 shoe 1 1
30 shoe 1 1
31 dog 1 5
32 dog 2 4
33 dog 3 2
34 cat 4 1
35 cat 5 3
36 cat 6 5
37 fish 1 4
38 fish 1 4
39 fish 3 3
40 cow 1 2
41 cow 1 2 *
42 cow 6 4
如果我使用dplyr和group_by(time,prod_id),它将在每个组中创建1:5序列,但是我希望每个组之间具有序列。
谢谢!
答案 0 :(得分:0)
这是第一个尝试,但是与您想要的输出完全不匹配。
library( data.table )
#set dample data as data.table
dt <- as.data.table( test )
#get unique values
dt.unique <- unique( dt, by = c("time", "prod_id"))
#create ID's for the unique values
dt.unique[, id := 1:.N, by = c("time")]
#left join the ID's back to the sample data
result <- dt.unique[dt, on = c("prod_id", "time")]
输出
# prod_id time id
# 1: shoe 1 1
# 2: shoe 1 1
# 3: shoe 1 1
# 4: bird 1 2
# 5: bird 1 2
# 6: bird 3 1
# 7: boat 1 3
# 8: boat 2 1
# 9: boat 2 1
# 10: boat 1 3
# 11: boat 2 1
# 12: boat 2 1
# 13: bird 1 2
# 14: bird 3 1
# 15: bird 3 1
# 16: fish 4 1
# 17: fish 4 1
# 18: fish 1 4
# 19: dog 1 5
# 20: dog 2 2
# 21: dog 3 2
# 22: cow 4 2
# 23: cow 5 1
# 24: cow 6 1
# 25: cat 1 6
# 26: cat 2 3
# 27: cat 3 3
# 28: shoe 1 1
# 29: shoe 1 1
# 30: shoe 1 1
# 31: dog 1 5
# 32: dog 2 2
# 33: dog 3 2
# 34: cat 4 3
# 35: cat 5 2
# 36: cat 6 2
# 37: fish 1 4
# 38: fish 1 4
# 39: fish 3 4
# 40: cow 1 7
# 41: cow 1 7
# 42: cow 6 1
# prod_id time id