检查数字是否在R中的2列之间

时间:2017-06-21 05:14:04

标签: r data.table dplyr plyr

我需要计算ID每个月的活动实例数。 我能够通过for循环实现它,但我有一个非常大的数据集,ID为12k,需要很长时间才能完成。任何有关更好解决方案的建议。

我的数据样本如下所示

ID  instances   start_month end_month
key1    x1  1397    1400
key1    x2  1395    1402
key1    x3  1399    1402
key1    x4  1398    1401
key2    x5  1396    1401
key2    x6  1398    1402
key2    x7  1398    1402

我希望我的输出如下所示

key1    1395    1
key1    1396    1
key1    1397    2
key1    1398    3
key1    1399    4
key1    1400    4
key1    1401    3
key1    1402    2
key2    1396    1
key2    1397    1
key2    1398    3
key2    1399    3
key2    1400    3
key2    1401    3
key2    1402    2

1 个答案:

答案 0 :(得分:3)

使用dplyr

DF %>%
  group_by(ID, instances) %>%
  do(data.frame(out=.$start_month:.$end_month)) %>%
  ungroup() %>%
  count(ID, out)

# # A tibble: 15 x 3
#       ID   out     n
#    <chr> <int> <int>
#  1  key1  1395     1
#  2  key1  1396     1
#  3  key1  1397     2
#  4  key1  1398     3
#  5  key1  1399     4
#  6  key1  1400     4
#  7  key1  1401     3
#  8  key1  1402     2
#  9  key2  1396     1
# 10  key2  1397     1
# 11  key2  1398     3
# 12  key2  1399     3
# 13  key2  1400     3
# 14  key2  1401     3
# 15  key2  1402     2

数据

DF <- structure(list(ID = c("key1", "key1", "key1", "key1", "key2", 
"key2", "key2"), instances = c("x1", "x2", "x3", "x4", "x5", 
"x6", "x7"), start_month = c(1397L, 1395L, 1399L, 1398L, 1396L, 
1398L, 1398L), end_month = c(1400L, 1402L, 1402L, 1401L, 1401L, 
1402L, 1402L)), .Names = c("ID", "instances", "start_month", 
"end_month"), class = "data.frame", row.names = c(NA, -7L))