我需要计算ID每个月的活动实例数。 我能够通过for循环实现它,但我有一个非常大的数据集,ID为12k,需要很长时间才能完成。任何有关更好解决方案的建议。
我的数据样本如下所示
ID instances start_month end_month
key1 x1 1397 1400
key1 x2 1395 1402
key1 x3 1399 1402
key1 x4 1398 1401
key2 x5 1396 1401
key2 x6 1398 1402
key2 x7 1398 1402
我希望我的输出如下所示
key1 1395 1
key1 1396 1
key1 1397 2
key1 1398 3
key1 1399 4
key1 1400 4
key1 1401 3
key1 1402 2
key2 1396 1
key2 1397 1
key2 1398 3
key2 1399 3
key2 1400 3
key2 1401 3
key2 1402 2
答案 0 :(得分:3)
使用dplyr
:
DF %>%
group_by(ID, instances) %>%
do(data.frame(out=.$start_month:.$end_month)) %>%
ungroup() %>%
count(ID, out)
# # A tibble: 15 x 3
# ID out n
# <chr> <int> <int>
# 1 key1 1395 1
# 2 key1 1396 1
# 3 key1 1397 2
# 4 key1 1398 3
# 5 key1 1399 4
# 6 key1 1400 4
# 7 key1 1401 3
# 8 key1 1402 2
# 9 key2 1396 1
# 10 key2 1397 1
# 11 key2 1398 3
# 12 key2 1399 3
# 13 key2 1400 3
# 14 key2 1401 3
# 15 key2 1402 2
DF <- structure(list(ID = c("key1", "key1", "key1", "key1", "key2",
"key2", "key2"), instances = c("x1", "x2", "x3", "x4", "x5",
"x6", "x7"), start_month = c(1397L, 1395L, 1399L, 1398L, 1396L,
1398L, 1398L), end_month = c(1400L, 1402L, 1402L, 1401L, 1401L,
1402L, 1402L)), .Names = c("ID", "instances", "start_month",
"end_month"), class = "data.frame", row.names = c(NA, -7L))