我有两个数据框:
d1.Kids <- c("Jack", "Jill", "Jillian", "John", "James")
d1.States <- c("CA", "MA", "DE", "HI", "PA")
d1 <- data.frame(d1.Kids, d1.States)
d1
d1.Kids d1.States
1 Jack CA
2 Jill MA
3 Jillian DE
4 John HI
5 James PA
d2.Ages <- c(10, 7, 12, 30)
d2.Kids <- c("Jill", "Jillian", "Jack", "Mary")
d2 <- data.frame(d2.Kids, d2.Ages)
d2
d2.Kids d2.Ages
1 Jill 10
2 Jillian 7
3 Jack 12
4 Mary 30
当我合并这两个数据框时,我得到以下结果:
merge(d1,d2)
结果:
d1.Kids d1.States d2.Kids d2.Ages
1 Jack CA Jill 10
2 Jill MA Jill 10
3 Jillian DE Jill 10
4 John HI Jill 10
5 James PA Jill 10
6 Jack CA Jillian 7
7 Jill MA Jillian 7
8 Jillian DE Jillian 7
9 John HI Jillian 7
10 James PA Jillian 7
11 Jack CA Jack 12
12 Jill MA Jack 12
13 Jillian DE Jack 12
14 John HI Jack 12
15 James PA Jack 12
16 Jack CA Mary 30
17 Jill MA Mary 30
18 Jillian DE Mary 30
19 John HI Mary 30
20 James PA Mary 30
我想得到这个结果:
kids ages states
1 jack 12 CA
2 jill 10 MA
3 jillian 7 DE
4 john NA HI
5 james NA PA
6 Mary 30 NA
答案 0 :(得分:1)
如果不使用by
,它将进行交叉连接,我们可以使用by
选项来避免这种情况。由于两列的列名都不相同,因此请使用by.x
,by.y
并使用all = TRUE
out <- merge(d1,d2, by.x = 'd1.Kids', by.y = 'd2.Kids', all = TRUE)
并通过删除前缀部分更改'out'的名称
names(out) <- sub("^[^.]+\\.", "", names(out))