我有两个像这样的数据集:
df <- data.frame(id = 1:20,
Sex = rep(x = c(0,1), each=10),
age = c(25,56,29,42,33,33,33,25,25,25,26,57,30,43,34,34,34,26,26,26),
ov = letters[1:20])
df1 <- data.frame(Sex = c(0,0,0,1,1),
age = c(25,33,39,41,43))
我想根据每组df1为每组性别和年龄df取一个随机行,但并非所有df1中的年龄都与df相匹配,所以我想对df1中的每个组都进行不匹配在df中,var ov的值与同性和最接近的年龄相关,如下所示:
df3 <- rbind(df[c(8,7),2:4],c(0,39,"d"),c(1,41,"n"),df[14,2:4])
请注意,性别= 0且年龄= 39的情况下的捐赠者是df [4,]并且注意到性别= 1且年龄= 41的情况下的捐赠者是df [14,]
我该怎么做:
答案 0 :(得分:1)
使用data.table
你可以尝试这样的事情:
1)将数据转换为data.table
并添加密钥:
df1
dt1 <- as.data.table(df1) # convert to data.table
dt1[, newSex := Sex] # this will serve as grouping column
dt1[, newage := age] # also this
setkey(dt1, Sex, age) # set data.tables keys
dt1
Sex age newSex newage
1: 0 25 0 25
2: 0 33 0 33
3: 0 39 0 39
4: 1 41 1 41
5: 1 43 1 43
# we do similar with df:
dt <- as.data.table(df)
setkey(dt, Sex, age)
dt
id Sex age ov
1: 1 0 25 a
2: 8 0 25 h
3: 9 0 25 i
4: 10 0 25 j
5: 3 0 29 c
6: 5 0 33 e
7: 6 0 33 f
8: 7 0 33 g
9: 4 0 42 d
10: 2 0 56 b
11: 11 1 26 k
12: 18 1 26 r
13: 19 1 26 s
14: 20 1 26 t
15: 13 1 30 m
16: 15 1 34 o
17: 16 1 34 p
18: 17 1 34 q
19: 14 1 43 n
20: 12 1 57 l
2)使用滚动合并,我们得到dtnew
新组:
dtnew <- dt1[dt, roll = "nearest"]
dtnew
Sex age newSex newage id ov
1: 0 25 0 25 1 a
2: 0 25 0 25 8 h
3: 0 25 0 25 9 i
4: 0 25 0 25 10 j
5: 0 29 0 25 3 c
6: 0 33 0 33 5 e
7: 0 33 0 33 6 f
8: 0 33 0 33 7 g
9: 0 42 0 39 4 d
10: 0 56 0 39 2 b
11: 1 26 1 41 11 k
12: 1 26 1 41 18 r
13: 1 26 1 41 19 s
14: 1 26 1 41 20 t
15: 1 30 1 41 13 m
16: 1 34 1 41 15 o
17: 1 34 1 41 16 p
18: 1 34 1 41 17 q
19: 1 43 1 43 14 n
20: 1 57 1 43 12 l
3)现在我们可以提供样品。在您的情况下,我们可以简单地按随机顺序重新排序行,然后取每组的第一行:
dtnew <- dtnew[sample(.N)] #create random order
sampleDT <- unique(dtnew, by = c("newSex", "newage")) #take first unique by newSex and newage
sampleDT
Sex age newSex newage id ov
1: 0 56 0 39 2 b
2: 0 29 0 25 3 c
3: 1 43 1 43 14 n
4: 1 34 1 41 16 p
5: 0 33 0 33 7 g