我对R一无所知,我有一个带有2列的data.frame,它们都是关于动物的性别,但其中一个有一些修正而另一个没有。
我想要的data.frame就像这样:
id sex father mother birth.date farm
0 1 john ray 05/06/94 1
1 1 doug ana 18/02/93 NA
2 2 bryan kim 21/03/00 3
但是我通过在另外2个data.frames上使用merge来获得这个data.frame
id sex.x father mother birth.date sex.y farm
0 2 john ray 05/06/94 1 1
1 1 doug ana 18/02/93 NA NA
2 2 bryan kim 21/03/00 2 3
data.frame 1或动物(某些动物的性别错误)
id sex father mother birth.date
0 2 john ray 05/06/94
1 1 doug ana 18/02/93
2 2 bryan kim 21/03/00
data.frame 2或Farm(具有正确的性别):
id farm sex
0 1 1
2 3 2
我使用的代码是:Animals_Farm <- merge(Animals , Farm, by="id", all.x=TRUE)
我需要将两个性别列合二为一,优先考虑sex.y.我该怎么做?
答案 0 :(得分:0)
如果我正确地理解了您的示例,则根据merge
函数中的示例,您的情况与我在下面显示的情况类似。
> (authors <- data.frame(
surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")),
nationality = c("US", "Australia", "US", "UK", "Australia"),
deceased = c("yes", rep("no", 3), "yes")))
surname nationality deceased
1 Tukey US yes
2 Venables Australia no
3 Tierney US no
4 Ripley UK no
5 McNeil Australia yes
> (books <- data.frame(
name = I(c("Tukey", "Venables", "Tierney",
"Ripley", "Ripley", "McNeil", "R Core")),
title = c("Exploratory Data Analysis",
"Modern Applied Statistics ...", "LISP-STAT",
"Spatial Statistics", "Stochastic Simulation",
"Interactive Data Analysis",
"An Introduction to R"),
deceased = c("yes", rep("no", 6))))
name title deceased
1 Tukey Exploratory Data Analysis yes
2 Venables Modern Applied Statistics ... no
3 Tierney LISP-STAT no
4 Ripley Spatial Statistics no
5 Ripley Stochastic Simulation no
6 McNeil Interactive Data Analysis no
7 R Core An Introduction to R no
> (m1 <- merge(authors, books, by.x = "surname", by.y = "name"))
surname nationality deceased.x title deceased.y
1 McNeil Australia yes Interactive Data Analysis no
2 Ripley UK no Spatial Statistics no
3 Ripley UK no Stochastic Simulation no
4 Tierney US no LISP-STAT no
5 Tukey US yes Exploratory Data Analysis yes
6 Venables Australia no Modern Applied Statistics ... no
authors
可能代表您的第一个数据框,而books
代表您的第二个数据框,deceased
可能是两个数据框中的值,但只是其中一个数据框中的最新值{{1} }})。
仅包含authors
的正确值的最简单方法是简单地从合并中排除不正确的值。
deceased
代码行> (m2 <- merge(authors, books[names(books) != "deceased"],
by.x = "surname", by.y = "name"))
surname nationality deceased title
1 McNeil Australia yes Interactive Data Analysis
2 Ripley UK no Spatial Statistics
3 Ripley UK no Stochastic Simulation
4 Tierney US no LISP-STAT
5 Tukey US yes Exploratory Data Analysis
6 Venables Australia no Modern Applied Statistics ...
只是将数据框books[names(books) != "deceased"]
设置为子集,以移除books
列,在最终合并中仅保留deceased
列中的正确deceased
列