Cbind两个数据但使用data.table

时间:2015-07-01 18:07:56

标签: r data.table

我有两个数据集如下:

data1<-structure(list(gear = c(3, 3, 3, 3, 5, 3, 3, 3, 4, 4, 3, 3, 3, 
3, 3, 5, 5, 5), carb = c(2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 
4, 4, 4, 4, 6, 8)), .Names = c("gear", "carb"), class = "data.frame", row.names = c(NA, 
-18L))

data1
   gear carb
1     3    2
2     3    2
3     3    2
4     3    2
5     5    2
6     3    3
7     3    3
8     3    3
9     4    4
10    4    4
11    3    4
12    3    4
13    3    4
14    3    4
15    3    4
16    5    4
17    5    6
18    5    8

data2<-structure(list(carb = c(1, 2, 2, 2, 2, 2, 3, 4, 4, 4, 4, 4, 4, 
4, 4, 6, 8), fac = c(1L, 1L, 2L, 3L, 4L, 5L, 1L, 1L, 2L, 3L, 
4L, 5L, 6L, 7L, 8L, 1L, 1L), hello = c(NA, 0.292389553859123, 
0.584779107718246, 0.804071273112588, 0.804071273112588, 0.402035636556294, 
NA, 0.460230801434478, 1.25285051501608, 1.15057700358619, 0.869324847154013, 
0.818188091439071, 0.894893225011484, 0.792619713581601, 0.51136755714942, 
NA, NA), hello2 = c(NA, 5L, 5L, 5L, 5L, 4L, NA, 1L, 1L, 2L, 2L, 
1L, 1L, 2L, 2L, NA, NA)), row.names = c(NA, -17L), class = "data.frame", .Names = c("carb", 
"fac", "hello", "hello2"))

   carb fac     hello hello2
1     1   1        NA     NA
2     2   1 0.2923896      5
3     2   2 0.5847791      5
4     2   3 0.8040713      5
5     2   4 0.8040713      5
6     2   5 0.4020356      4
7     3   1        NA     NA
8     4   1 0.4602308      1
9     4   2 1.2528505      1
10    4   3 1.1505770      2
11    4   4 0.8693248      2
12    4   5 0.8181881      1
13    4   6 0.8948932      1
14    4   7 0.7926197      2
15    4   8 0.5113676      2
16    6   1        NA     NA
17    8   1        NA     NA

假设data1是主要数据。我想cbind data1和data2(NO MERGE)。但是,正如您所看到的,他们没有相同的行数。一种方法,我试图实现这一点是使用常见的var carb。如果carb的类别在数据2中但在data1中没有,我不想从data2中绑定该类别。例如,在上面的数据中,值为1的carb在data2中但在data1中没有,因此在cbinding时忽略它。如果两个数据集中存在的类别的行数不相同,我将使用data1中每个类别的行数。例如,对于carb值= 3,数据1中的行数是3,而在data2中它是1.因此,在cbind之前,我需要在数据2中有3行用于carb 3。另外两行应该只复制data1中的行。我想要的输出(订单需要像data1一样保持完整):

     +----------------------------------------+
     | gear   carb   fac       hello   hello2 |
     |----------------------------------------|
  1. |    3      2     1   0.2923896        5 |
  2. |    3      2     2   0.5847791        5 |
  3. |    3      2     3   0.8040713        5 |
  4. |    3      2     4   0.8040713        5 |
  5. |    5      2     5   0.4020356        4 |
     |----------------------------------------|
  6. |    3      3     1          NA       NA |
  7. |    3      3     1          NA       NA |
  8. |    3      3     1          NA       NA |
  9. |    4      4     1   0.4602308        1 |
 10. |    4      4     2   1.2528505        1 |
     |----------------------------------------|
 11. |    3      4     3    1.150577        2 |
 12. |    3      4     4   0.8693248        2 |
 13. |    3      4     5   0.8181881        1 |
 14. |    3      4     6   0.8948932        1 |
 15. |    3      4     7   0.7926197        2 |
     |----------------------------------------|
 16. |    5      4     8   0.5113676        2 |
 17. |    5      6     1          NA       NA |
 18. |    5      8     1          NA       NA |
     +----------------------------------------+

我想知道data.table包中是否存在某种特定功能。

2 个答案:

答案 0 :(得分:5)

我认为你确实想要合并:

setDT(data1)
setDT(data2)

data1[,fac:=1:.N,by=carb]

setkey(data1,carb,fac)
setkey(data2,carb,fac)
data2[data1]

给出了

    carb fac     hello hello2 gear
 1:    2   1 0.2923896      5    3
 2:    2   2 0.5847791      5    3
 3:    2   3 0.8040713      5    3
 4:    2   4 0.8040713      5    3
 5:    2   5 0.4020356      4    5
 6:    3   1        NA     NA    3
 7:    3   2        NA     NA    3
 8:    3   3        NA     NA    3
 9:    4   1 0.4602308      1    4
10:    4   2 1.2528505      1    4
11:    4   3 1.1505770      2    3
12:    4   4 0.8693248      2    3
13:    4   5 0.8181881      1    3
14:    4   6 0.8948932      1    3
15:    4   7 0.7926197      2    3
16:    4   8 0.5113676      2    5
17:    6   1        NA     NA    5
18:    8   1        NA     NA    5

答案 1 :(得分:2)

使用与@Frank答案相同的想法(将fac添加到data1):

library(dplyr)

data1 %>%
  group_by(carb) %>%
  mutate(fac = row_number()) %>%
  left_join(., data2)

给出了:

#Source: local data frame [18 x 5]
#Groups: carb
#
#   gear carb fac     hello hello2
#1     3    2   1 0.2923896      5
#2     3    2   2 0.5847791      5
#3     3    2   3 0.8040713      5
#4     3    2   4 0.8040713      5
#5     5    2   5 0.4020356      4
#6     3    3   1        NA     NA
#7     3    3   2        NA     NA
#8     3    3   3        NA     NA
#9     4    4   1 0.4602308      1
#10    4    4   2 1.2528505      1
#11    3    4   3 1.1505770      2
#12    3    4   4 0.8693248      2
#13    3    4   5 0.8181881      1
#14    3    4   6 0.8948932      1
#15    3    4   7 0.7926197      2
#16    5    4   8 0.5113676      2
#17    5    6   1        NA     NA
#18    5    8   1        NA     NA