合并两个数据帧,删除R中的重复项和聚合

时间:2016-06-15 09:23:49

标签: r dataframe merge

我在R中有两个数据框,名为house和candidate。

house

      House       Region                 Military_Strength
1 Stark           The North              20000
2 Targaryen       Slaver's Bay           110000
3 Lannister       The Westerlands        60000
4 Baratheon       The Stormlands         40000
5 Tyrell          The Reach              30000


candidates

  House               Name                  Region
1 Lannister           Jamie Lannister       Westros
2 Stark               Robb Stark            North
3 Stark               Arya Stark            Westros
4 Lannister           Cersi Lannister       Westros
5 Targaryen           Daenerys Targaryen    Mereene
6 Baratheon           Robert Baratheon      Westros
7 Mormont             Jorah Mormont         Mereene

我想在house的基础上合并两个数据帧。为此我有 完成:

merge(candidates, house, by="House", sort=FALSE)

输出结果为:

       House        Name         Region.x        Region.y   Military_Strength
 1 Lannister    Jamie Lannister  Westros     The Westerlands             60000
 2 Lannister    Cersi Lannister  Westros     The Westerlands             60000
 3 Stark         Robb Stark      North       The North                   20000
 4 Stark         Arya Stark      Westros     The North                   20000
 5 Targaryen Daenerys Targaryen  Mereene     Slaver's Bay                110000
 6 Baratheon   Robert Baratheon  Westros     The Stormlands              40000

我想从每个房子(如果有的话)中删除第二个名字候选人,但是它 Military_Strength应该加到同一个房子的第一个候选人。

例如:

4 Stark         Arya Stark      Westros     The North                   20000

将被移除但是,20000将被添加到第3行Robb Stark Military_Strength。 如何以适当的方式做到这一点?

1 个答案:

答案 0 :(得分:1)

df1之后获得的data.frame merge()开始,可以继续:

df1$Military_Strength <- with(df1, ave(Military_Strength, House, FUN=sum))
df1[!duplicated(df1$House),]
#      House               Name Region.x        Region.y Military_Strength
#1 Lannister    Jamie Lannister  Westros The Westerlands            120000
#3     Stark         Robb Stark    North       The North             40000
#5 Targaryen Daenerys Targaryen  Mereene    Slaver's Bay            110000
#6 Baratheon   Robert Baratheon  Westros  The Stormlands             40000

此示例中使用的数据

df1 <- structure(list(House = structure(c(2L, 2L, 3L, 3L, 4L, 1L), 
                .Label = c("Baratheon", "Lannister", "Stark", "Targaryen"), 
                class = "factor"), Name = structure(c(4L, 2L, 5L, 1L, 3L, 6L), 
                .Label = c("Arya Stark", "Cersi Lannister", "Daenerys Targaryen", 
                "Jamie Lannister", "Robb Stark", "Robert Baratheon"), 
                class = "factor"), Region.x = structure(c(3L, 3L, 2L, 3L, 1L, 3L), 
                .Label = c("Mereene", "North", "Westros"), class = "factor"), 
                Region.y = structure(c(4L, 4L, 2L, 2L, 1L, 3L), 
                .Label = c("Slaver's Bay", "The North", "The Stormlands",
                  "The Westerlands"), class = "factor"), 
                Military_Strength = c(60000L, 60000L, 20000L, 20000L, 110000L, 
                40000L)), .Names = c("House", "Name", "Region.x", "Region.y", 
                "Military_Strength"), class = "data.frame", row.names = c("1", 
                "2", "3", "4", "5", "6"))