使用另一个data.table子集R data.table

时间:2016-04-13 21:16:12

标签: r data.table subset

我有一些与Subsetting a data.table using another data.tableSubset a data.table by matching columns of another data.table

相同的问题

dt是一样的。

dt

   id year event
1:  2 2005     1
2:  2 2006     1
3:  2 2007     1
4:  4 2008     1
5:  4 2009     1
6:  2 2005     0
7:  4 2006     0
8:  4 2007     0
9:  2 2008     0

dt <- data.table(id = c(2,2,2,4,4,2,4,4,2), year = c(2005:2009,2005:2008),
                 event = rep(1:0, times=c(5, 4)))

但是,dt1有点不同

dt1

   year performance  event
1: 2005        1000      1
2: 2006        1001      1
3: 2007        1002      1
4: 2008        1003      1
5: 2009        1004      1
6: 2005        1005      0
7: 2006        1006      0
8: 2007        1007      0
9: 2008        1008      0

dt1 <- data.table(year = c(2005:2009,2005:2008), performance = 1000:1008,
                  event = rep(1:0, times=c(5, 4)))

我希望根据dt1 dt和事件分组id。期望的输出将是两个不同的data.tables:

dt1.sub1
   year performance  event
1: 2005        1000      1
2: 2006        1001      1
3: 2007        1002      1
4: 2005        1005      0
5: 2008        1008      0


dt1.sub2
   year performance  event
1: 2008        1003      1
2: 2009        1004      1
3: 2006        1006      0
4: 2007        1007      0

有没有办法在不使用合并的情况下实现这一目标?

2 个答案:

答案 0 :(得分:2)

我们可以使用split创建list'data.tables'。

lst <- split(dt1, dt$id)
names(lst) <- paste0('dt1.sub', seq_along(lst))
lst
#$dt1.sub1
#   year performance event
#1: 2005        1000     1
#2: 2006        1001     1
#3: 2007        1002     1
#4: 2005        1005     0
#5: 2008        1008     0

#$dt1.sub2
#   year performance event
#1: 2008        1003     1
#2: 2009        1004     1
#3: 2006        1006     0
#4: 2007        1007     0

最好在list内工作。但是,如果确实需要,则可以使用data.table

在全局环境中创建单独的list2env个对象
list2env(lst, envir = .GlobalEnv)

答案 1 :(得分:2)

dt[dt1, on = c('year', 'event')][, .(list(.SD)), by = id]$V1
#[[1]]
#   year event performance
#1: 2005     1        1000
#2: 2006     1        1001
#3: 2007     1        1002
#4: 2005     0        1005
#5: 2008     0        1008
#
#[[2]]
#   year event performance
#1: 2008     1        1003
#2: 2009     1        1004
#3: 2006     0        1006
#4: 2007     0        1007