根据R中的特定列值添加具有相同列名的多个数据帧

时间:2016-08-20 05:41:58

标签: r dataframe merge

我有多个具有相同列名和维度的数据框。 :

nil

我想要做的是创建另一个数据框 df1 device_id price tax 1 a 200 5 2 b 100 2 3 c 50 1 df2 device_id price tax 1 b 200 7 2 a 100 3 3 c 50 1 df3 device_id price tax 1 c 50 5 2 b 300 1 3 a 50 2 ,我将在上面的三个数据框中添加匹配df的价格和税值。

所以, device_id就像

df

我该怎么办?此外,如果解决方案可以推广到更大数量的数据帧而不仅仅是3,那就太棒了。

2 个答案:

答案 0 :(得分:1)

首先,将所有数据框放入一个列表(此处称为dflist,定义如下)。然后,在对列表元素进行行绑定后,可以很容易地使用aggregate()

aggregate(. ~ device_id, do.call(rbind, dflist), sum)
#   device_id price tax
# 1         a   350  10
# 2         b   600  10
# 3         c   150   7

或者您可以使用 data.table 包。

library(data.table)
rbindlist(dflist)[, lapply(.SD, sum), by = device_id]
#    device_id price tax
# 1:         a   350  10
# 2:         b   600  10
# 3:         c   150   7

dplyr

library(dplyr)
bind_rows(dflist) %>% 
    group_by(device_id) %>%
    summarize_each(funs(sum))
# Source: local data frame [3 x 3]
#
#   device_id price   tax
#      <fctr> <int> <int>
# 1         a   350    10
# 2         b   600    10
# 3         c   150     7

数据:

dflist <- structure(list(df1 = structure(list(device_id = structure(1:3, .Label = c("a", 
"b", "c"), class = "factor"), price = c(200L, 100L, 50L), tax = c(5L, 
2L, 1L)), .Names = c("device_id", "price", "tax"), class = "data.frame", row.names = c("1", 
"2", "3")), df2 = structure(list(device_id = structure(c(2L, 
1L, 3L), .Label = c("a", "b", "c"), class = "factor"), price = c(200L, 
100L, 50L), tax = c(7L, 3L, 1L)), .Names = c("device_id", "price", 
"tax"), class = "data.frame", row.names = c("1", "2", "3")), 
    df3 = structure(list(device_id = structure(c(3L, 2L, 1L), .Label = c("a", 
    "b", "c"), class = "factor"), price = c(50L, 300L, 50L), 
        tax = c(5L, 1L, 2L)), .Names = c("device_id", "price", 
    "tax"), class = "data.frame", row.names = c("1", "2", "3"
    ))), .Names = c("df1", "df2", "df3"))

答案 1 :(得分:1)

在我们将所有data.frame对象放入bybase R)rbind后,我们可以使用list中的mget(paste0("df", 1:3)) >

 dfN <- do.call(rbind, mget(paste0("df", 1:3)))
 do.call(rbind, by(dfN[-1], dfN[1], FUN = colSums))