我有两个数据集如下:
data1<-structure(list(gear = c(3, 3, 3, 3, 5, 3, 3, 3, 4, 4, 3, 3, 3,
3, 3, 5, 5, 5), carb = c(2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4,
4, 4, 4, 4, 6, 8)), .Names = c("gear", "carb"), class = "data.frame", row.names = c(NA,
-18L))
data1
gear carb
1 3 2
2 3 2
3 3 2
4 3 2
5 5 2
6 3 3
7 3 3
8 3 3
9 4 4
10 4 4
11 3 4
12 3 4
13 3 4
14 3 4
15 3 4
16 5 4
17 5 6
18 5 8
data2<-structure(list(carb = c(1, 2, 2, 2, 2, 2, 3, 4, 4, 4, 4, 4, 4,
4, 4, 6, 8), fac = c(1L, 1L, 2L, 3L, 4L, 5L, 1L, 1L, 2L, 3L,
4L, 5L, 6L, 7L, 8L, 1L, 1L), hello = c(NA, 0.292389553859123,
0.584779107718246, 0.804071273112588, 0.804071273112588, 0.402035636556294,
NA, 0.460230801434478, 1.25285051501608, 1.15057700358619, 0.869324847154013,
0.818188091439071, 0.894893225011484, 0.792619713581601, 0.51136755714942,
NA, NA), hello2 = c(NA, 5L, 5L, 5L, 5L, 4L, NA, 1L, 1L, 2L, 2L,
1L, 1L, 2L, 2L, NA, NA)), row.names = c(NA, -17L), class = "data.frame", .Names = c("carb",
"fac", "hello", "hello2"))
carb fac hello hello2
1 1 1 NA NA
2 2 1 0.2923896 5
3 2 2 0.5847791 5
4 2 3 0.8040713 5
5 2 4 0.8040713 5
6 2 5 0.4020356 4
7 3 1 NA NA
8 4 1 0.4602308 1
9 4 2 1.2528505 1
10 4 3 1.1505770 2
11 4 4 0.8693248 2
12 4 5 0.8181881 1
13 4 6 0.8948932 1
14 4 7 0.7926197 2
15 4 8 0.5113676 2
16 6 1 NA NA
17 8 1 NA NA
假设data1是主要数据。我想cbind data1和data2(NO MERGE)。但是,正如您所看到的,他们没有相同的行数。一种方法,我试图实现这一点是使用常见的var carb
。如果carb
的类别在数据2中但在data1中没有,我不想从data2中绑定该类别。例如,在上面的数据中,值为1的carb在data2中但在data1中没有,因此在cbinding时忽略它。如果两个数据集中存在的类别的行数不相同,我将使用data1中每个类别的行数。例如,对于carb值= 3,数据1中的行数是3,而在data2中它是1.因此,在cbind
之前,我需要在数据2中有3行用于carb 3。另外两行应该只复制data1中的行。我想要的输出(订单需要像data1一样保持完整):
+----------------------------------------+
| gear carb fac hello hello2 |
|----------------------------------------|
1. | 3 2 1 0.2923896 5 |
2. | 3 2 2 0.5847791 5 |
3. | 3 2 3 0.8040713 5 |
4. | 3 2 4 0.8040713 5 |
5. | 5 2 5 0.4020356 4 |
|----------------------------------------|
6. | 3 3 1 NA NA |
7. | 3 3 1 NA NA |
8. | 3 3 1 NA NA |
9. | 4 4 1 0.4602308 1 |
10. | 4 4 2 1.2528505 1 |
|----------------------------------------|
11. | 3 4 3 1.150577 2 |
12. | 3 4 4 0.8693248 2 |
13. | 3 4 5 0.8181881 1 |
14. | 3 4 6 0.8948932 1 |
15. | 3 4 7 0.7926197 2 |
|----------------------------------------|
16. | 5 4 8 0.5113676 2 |
17. | 5 6 1 NA NA |
18. | 5 8 1 NA NA |
+----------------------------------------+
我想知道data.table
包中是否存在某种特定功能。
答案 0 :(得分:5)
我认为你确实想要合并:
setDT(data1)
setDT(data2)
data1[,fac:=1:.N,by=carb]
setkey(data1,carb,fac)
setkey(data2,carb,fac)
data2[data1]
给出了
carb fac hello hello2 gear
1: 2 1 0.2923896 5 3
2: 2 2 0.5847791 5 3
3: 2 3 0.8040713 5 3
4: 2 4 0.8040713 5 3
5: 2 5 0.4020356 4 5
6: 3 1 NA NA 3
7: 3 2 NA NA 3
8: 3 3 NA NA 3
9: 4 1 0.4602308 1 4
10: 4 2 1.2528505 1 4
11: 4 3 1.1505770 2 3
12: 4 4 0.8693248 2 3
13: 4 5 0.8181881 1 3
14: 4 6 0.8948932 1 3
15: 4 7 0.7926197 2 3
16: 4 8 0.5113676 2 5
17: 6 1 NA NA 5
18: 8 1 NA NA 5
答案 1 :(得分:2)
使用与@Frank答案相同的想法(将fac
添加到data1
):
library(dplyr)
data1 %>%
group_by(carb) %>%
mutate(fac = row_number()) %>%
left_join(., data2)
给出了:
#Source: local data frame [18 x 5]
#Groups: carb
#
# gear carb fac hello hello2
#1 3 2 1 0.2923896 5
#2 3 2 2 0.5847791 5
#3 3 2 3 0.8040713 5
#4 3 2 4 0.8040713 5
#5 5 2 5 0.4020356 4
#6 3 3 1 NA NA
#7 3 3 2 NA NA
#8 3 3 3 NA NA
#9 4 4 1 0.4602308 1
#10 4 4 2 1.2528505 1
#11 3 4 3 1.1505770 2
#12 3 4 4 0.8693248 2
#13 3 4 5 0.8181881 1
#14 3 4 6 0.8948932 1
#15 3 4 7 0.7926197 2
#16 5 4 8 0.5113676 2
#17 5 6 1 NA NA
#18 5 8 1 NA NA