在R中组合2列,优先于其中一列

时间:2016-10-06 20:07:44

标签: r merge multiple-columns

我对R一无所知,我有一个带有2列的data.frame,它们都是关于动物的性别,但其中一个有一些修正而另一个没有。

我想要的data.frame就像这样:

    id  sex  father mother birth.date  farm
    0    1    john   ray   05/06/94     1
    1    1    doug   ana   18/02/93     NA
    2    2    bryan  kim   21/03/00     3

但是我通过在另外2个data.frames上使用merge来获得这个data.frame

  id sex.x father mother birth.date sex.y farm
    0    2    john   ray    05/06/94    1     1
    1    1    doug   ana    18/02/93    NA    NA
    2    2    bryan  kim    21/03/00    2     3

data.frame 1或动物(某些动物的性别错误)

 id sex father mother birth.date
  0  2   john   ray    05/06/94
  1  1   doug   ana    18/02/93
  2  2   bryan  kim    21/03/00

data.frame 2或Farm(具有正确的性别):

 id farm sex
  0  1    1
  2  3    2

我使用的代码是:Animals_Farm <- merge(Animals , Farm, by="id", all.x=TRUE)

我需要将两个性别列合二为一,优先考虑sex.y.我该怎么做?

1 个答案:

答案 0 :(得分:0)

如果我正确地理解了您的示例,则根据merge函数中的示例,您的情况与我在下面显示的情况类似。

> (authors <- data.frame(
      surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")),
      nationality = c("US", "Australia", "US", "UK", "Australia"),
      deceased = c("yes", rep("no", 3), "yes")))

   surname nationality deceased
1    Tukey          US      yes
2 Venables   Australia       no
3  Tierney          US       no
4   Ripley          UK       no
5   McNeil   Australia      yes

> (books <- data.frame(
      name = I(c("Tukey", "Venables", "Tierney",
                 "Ripley", "Ripley", "McNeil", "R Core")),
      title = c("Exploratory Data Analysis",
                "Modern Applied Statistics ...", "LISP-STAT",
                "Spatial Statistics", "Stochastic Simulation",
                "Interactive Data Analysis",
                "An Introduction to R"),
      deceased = c("yes", rep("no", 6))))

      name                         title deceased
1    Tukey     Exploratory Data Analysis      yes
2 Venables Modern Applied Statistics ...       no
3  Tierney                     LISP-STAT       no
4   Ripley            Spatial Statistics       no
5   Ripley         Stochastic Simulation       no
6   McNeil     Interactive Data Analysis       no
7   R Core          An Introduction to R       no

> (m1 <- merge(authors, books, by.x = "surname", by.y = "name"))

   surname nationality deceased.x                         title deceased.y
1   McNeil   Australia        yes     Interactive Data Analysis         no
2   Ripley          UK         no            Spatial Statistics         no
3   Ripley          UK         no         Stochastic Simulation         no
4  Tierney          US         no                     LISP-STAT         no
5    Tukey          US        yes     Exploratory Data Analysis        yes
6 Venables   Australia         no Modern Applied Statistics ...         no

authors可能代表您的第一个数据框,而books代表您的第二个数据框,deceased可能是两个数据框中的值,但只是其中一个数据框中的最新值{{1} }})。

仅包含authors的正确值的最简单方法是简单地从合并中排除不正确的值。

deceased

代码行> (m2 <- merge(authors, books[names(books) != "deceased"], by.x = "surname", by.y = "name")) surname nationality deceased title 1 McNeil Australia yes Interactive Data Analysis 2 Ripley UK no Spatial Statistics 3 Ripley UK no Stochastic Simulation 4 Tierney US no LISP-STAT 5 Tukey US yes Exploratory Data Analysis 6 Venables Australia no Modern Applied Statistics ... 只是将数据框books[names(books) != "deceased"]设置为子集,以移除books列,在最终合并中仅保留deceased列中的正确deceased