这是我的数据:
df <- data.table(loc.id = c(22,22,23,23,23,24,24,24,25,25,25,27,27,27,27),
month = sample(c(1:12), 15, replace = TRUE))
loc.id month
1: 22 1
2: 22 4
3: 23 12
4: 23 10
5: 23 7
6: 24 4
7: 24 3
8: 24 11
9: 25 2
10: 25 3
11: 25 4
12: 27 1
13: 27 5
14: 27 12
15: 27 1
对于每个loc.id
,我有几个月的时间。我想要做的是在df
中插入新行。对于每个loc.id
,我想要插入另外两行:min(month) - 1
中一行的月值等于df
,另一行的月值等于max(month) + 1
我的最终数据应如下所示:
loc.id month
1: 22 1
2: 22 4
3: 22 0 # min(month) - 1
4: 22 5 # max(month) + 1
5: 23 12
6: 23 10
7: 23 7
8: 23 6 # min(month) - 1
9: 23 13 # max(month) + 1
10: 24 4
11: 24 3
12: 24 11
13: 24 2 # min(month) - 1
14: 24 12 # max(month) + 1
.
.
我已设法将这些添加为列,但我需要将它们添加为行
df %>%
group_by(loc.id) %>%
mutate(month.min = min(month) - 1,
month.max = max(month) + 1)
答案 0 :(得分:1)
使用data.table
:
dfmm <- df[, .(min.month = min(month) - 1, max.month = max(month) + 1), by = loc.id
][, melt(.SD, id = 1)][, .(loc.id, month = value)]
rbindlist(list(df, dfmm))
或@Frank在评论中建议的较短选项:
df[, rbind(.SD, .(range(month) + c(-1,1))), by = loc.id]
给出:
loc.id month 1: 22 5 2: 22 6 3: 23 1 4: 23 3 5: 23 6 6: 24 4 7: 24 8 8: 24 2 9: 25 12 10: 25 7 11: 25 5 12: 27 8 13: 27 12 14: 27 9 15: 27 10 16: 22 4 17: 23 0 18: 24 1 19: 25 4 20: 27 7 21: 22 7 22: 23 7 23: 24 9 24: 25 13 25: 27 13
如果您想订购它们,您可以这样做:
rbindlist(list(df, dfmm))[order(loc.id)]
或使用dplyr
和tidyr
:
library(dplyr)
library(tidyr)
df %>%
group_by(loc.id) %>%
summarise(min.month = min(month) - 1,
max.month = max(month) + 1) %>%
gather(key, val, -1) %>%
select(loc.id, month = val) %>%
bind_rows(df, .)
或(受data.table
- @Frank方法的启发):
df %>%
group_by(loc.id) %>%
do(data.frame(month = range(.$month) + c(-1,1))) %>%
bind_rows(df, .)
使用过的数据:
library(data.table)
set.seed(2018)
df <- data.table(loc.id = c(22,22,23,23,23,24,24,24,25,25,25,27,27,27,27),
month = sample(c(1:12), 15, replace = TRUE))
答案 1 :(得分:0)
您可以使用do(...)
和add_row(...)
library(tidyverse)
df %>%
group_by(loc.id) %>%
do(add_row(., loc.id = rep(unique(.$loc.id), 2),
month = c(min(.$month) - 1, max(.$month) + 1))) %>%
ungroup()
# A tibble: 25 x 2
# loc.id month
# <dbl> <dbl>
# 1 22. 7.
# 2 22. 2.
# 3 22. 1.
# 4 22. 8.
# 5 23. 5.
# 6 23. 7.
# 7 23. 7.
# 8 23. 4.
# 9 23. 8.
# 10 24. 9.
# ... with 15 more rows