在r中的组级别分配1:5组ID

时间:2018-10-08 06:19:08

标签: r dataframe dplyr

在以下数据集中,我想按时间分组,然后按prod_id分组,并在组级别创建一个1:5的组变量。

例如,在期望的结果上,您可以看到time == 1 prod_id ==“ shoe”是第一组,然后time == 1 prod_id ==“ bird”是第二组,依此类推。 == 2它使序列继续进行,而不是从1:5重新开始。例如,在所需的输出行号8 time == 2 prod_id ==“ boat”和group == 3而不是1。

test <- data.frame('prod_id'= c("shoe",  "shoe", "shoe", "bird", "bird", "bird",
                            "boat", "boat","boat","boat","boat","boat", 
                            "bird", "bird",  "bird", "fish", "fish", "fish",
                            "dog", "dog",  "dog","cow", "cow", "cow",
                            "cat", "cat", "cat", "shoe", "shoe", "shoe",
                            "dog", "dog", "dog", "cat", "cat", "cat",
                            "fish", "fish", "fish", "cow", "cow", "cow"), 
               'time' = c(1, 1, 1, 1, 1, 3,
                         1, 2, 2, 1, 2, 2,
                         1, 3, 3, 4, 4, 1,
                         1, 2, 3, 4, 5, 6,
                         1, 2, 3, 1, 1, 1,
                         1, 2, 3, 4, 5, 6,
                         1, 1, 3, 1, 1, 6))
test

   prod_id time
1     shoe    1
2     shoe    1
3     shoe    1
4     bird    1
5     bird    1
6     bird    3
7     boat    1
8     boat    2
9     boat    2
10    boat    1
11    boat    2
12    boat    2
13    bird    1
14    bird    3
15    bird    3
16    fish    4
17    fish    4
18    fish    1
19     dog    1
20     dog    2
21     dog    3
22     cow    4
23     cow    5
24     cow    6
25     cat    1
26     cat    2
27     cat    3
28    shoe    1
29    shoe    1
30    shoe    1
31     dog    1
32     dog    2
33     dog    3
34     cat    4
35     cat    5
36     cat    6
37    fish    1
38    fish    1
39    fish    3
40     cow    1
41     cow    1
42     cow    6

所需的输出:

   prod_id time group
1     shoe    1   1
2     shoe    1   1
3     shoe    1   1
4     bird    1   2
5     bird    1   2 
6     bird    3   1
7     boat    1   3
8     boat    2   3 *
9     boat    2   3
10    boat    1   3
11    boat    2   3
12    boat    2   3
13    bird    1   2
14    bird    3   1
15    bird    3   1
16    fish    4   4
17    fish    4   4
18    fish    1   4
19     dog    1   5
20     dog    2   4
21     dog    3   2
22     cow    4   5
23     cow    5   2
24     cow    6   4
25     cat    1   1 *
26     cat    2   5
27     cat    3   3
28    shoe    1   1
29    shoe    1   1
30    shoe    1   1
31     dog    1   5
32     dog    2   4
33     dog    3   2
34     cat    4   1
35     cat    5   3
36     cat    6   5
37    fish    1   4
38    fish    1   4 
39    fish    3   3
40     cow    1   2
41     cow    1   2 *
42     cow    6   4

如果我使用dplyr和group_by(time,prod_id),它将在每个组中创建1:5序列,但是我希望每个组之间具有序列。

谢谢!

1 个答案:

答案 0 :(得分:0)

这是第一个尝试,但是与您想要的输出完全不匹配。

library( data.table )
#set dample data as data.table
dt <- as.data.table( test )
#get unique values
dt.unique <- unique( dt, by = c("time", "prod_id"))
#create ID's for the unique values
dt.unique[, id := 1:.N, by = c("time")]
#left join the ID's back to the sample data
result <- dt.unique[dt, on = c("prod_id", "time")]

输出

#     prod_id time id
#  1:    shoe    1  1
#  2:    shoe    1  1
#  3:    shoe    1  1
#  4:    bird    1  2
#  5:    bird    1  2
#  6:    bird    3  1
#  7:    boat    1  3
#  8:    boat    2  1
#  9:    boat    2  1
# 10:    boat    1  3
# 11:    boat    2  1
# 12:    boat    2  1
# 13:    bird    1  2
# 14:    bird    3  1
# 15:    bird    3  1
# 16:    fish    4  1
# 17:    fish    4  1
# 18:    fish    1  4
# 19:     dog    1  5
# 20:     dog    2  2
# 21:     dog    3  2
# 22:     cow    4  2
# 23:     cow    5  1
# 24:     cow    6  1
# 25:     cat    1  6
# 26:     cat    2  3
# 27:     cat    3  3
# 28:    shoe    1  1
# 29:    shoe    1  1
# 30:    shoe    1  1
# 31:     dog    1  5
# 32:     dog    2  2
# 33:     dog    3  2
# 34:     cat    4  3
# 35:     cat    5  2
# 36:     cat    6  2
# 37:    fish    1  4
# 38:    fish    1  4
# 39:    fish    3  4
# 40:     cow    1  7
# 41:     cow    1  7
# 42:     cow    6  1
#     prod_id time id