合并多个数据框时更改表单

时间:2018-05-01 20:52:10

标签: r dataframe merge

我有几个数据帧都是相同的格式,如:

destination

我希望合并这些数据框,但结果形式不同,所需的输出如下:

price <- data.frame(Year= c(2001, 2002, 2003),
                    A=c(1,2,3),B=c(2,3,4), C=c(4,5,6))
size <- data.frame(Year= c(2001, 2002, 2003), 
                   A=c(1,2,3),B=c(2,3,4), C=c(4,5,6))
performance <- data.frame(Year= c(2001, 2002, 2003),
                          A=c(1,2,3),B=c(2,3,4), C=c(4,5,6))

> price
  Year A B C
1 2001 1 2 4
2 2002 2 3 5
3 2003 3 4 6

> size
  Year A B C
1 2001 1 2 4
2 2002 2 3 5
3 2003 3 4 6

> performance
  Year A B C
1 2001 1 2 4
2 2002 2 3 5
3 2003 3 4 6

按名称顺序排列数据,然后按顺序排列日期。由于我在20个数据框架中每个都有超过2000个名称和180个日期,因此只需输入特定名称就很难对其进行排序。

5 个答案:

答案 0 :(得分:2)

您需要将数据帧转换为长格式然后将它们连接在一起

library(tidyverse)

price_long <- price %>% gather(key, value = "price", -Year)
size_long <- size %>% gather(key, value = "size", -Year)
performance_long <- performance %>% gather(key, value = "performance", -Year)

price_long %>% 
  left_join(size_long) %>% 
  left_join(performance_long)

Joining, by = c("Year", "key")
Joining, by = c("Year", "key")

  Year key price size performance
1 2001   A     1    1           1
2 2002   A     2    2           2
3 2003   A     3    3           3
4 2001   B     2    2           2
5 2002   B     3    3           3
6 2003   B     4    4           4
7 2001   C     4    4           4
8 2002   C     5    5           5
9 2003   C     6    6           6

答案 1 :(得分:1)

我们可以组合数据帧,收集和传播组合数据帧。

library(tidyverse)

dat <- list(price, size, performance) %>%
  setNames(c("price", "size", "performance")) %>%
  bind_rows(.id = "type") %>%
  gather(name, value, A:C) %>%
  spread(type, value) %>%
  arrange(name, Year)

dat
#   Year name performance price size
# 1 2001    A           1     1    1
# 2 2002    A           2     2    2
# 3 2003    A           3     3    3
# 4 2001    B           2     2    2
# 5 2002    B           3     3    3
# 6 2003    B           4     4    4
# 7 2001    C           4     4    4
# 8 2002    C           5     5    5
# 9 2003    C           6     6    6

答案 2 :(得分:1)

library(tidyverse) bind_rows(list(price = price, size = size, performance = performance), .id="Type") %>% gather(Key, Value, - Type, -Year) %>% spread(Type, Value) # Year Key performance price size # 1 2001 A 1 1 1 # 2 2001 B 2 2 2 # 3 2001 C 4 4 4 # 4 2002 A 2 2 2 # 5 2002 B 3 3 3 # 6 2002 C 5 5 5 # 7 2003 A 3 3 3 # 8 2003 B 4 4 4 # 9 2003 C 6 6 6 在这种情况下非常方便。解决方案可以是:

@www

上述解决方案与setNames非常相似。它只是避免使用=SUMPRODUCT(SUMIFS($A$1:$A$10,$B$1:$B$10,$C$1:$C$2))

答案 3 :(得分:1)

您可以使用data.table

library(data.table)
a=list(price=price,size=size,performance=performance)
dcast(melt(rbindlist(a,T,idcol = "name"),1:2),variable+Year~name)
   variable Year performance price size
1:        A 2001           1     1    1
2:        A 2002           2     2    2
3:        A 2003           3     3    3
4:        B 2001           2     2    2
5:        B 2002           3     3    3
6:        B 2003           4     4    4
7:        C 2001           4     4    4
8:        C 2002           5     5    5
9:        C 2003           6     6    6

答案 4 :(得分:0)

要完善它,这里是无包装的基础R答案。

# gather the data.frames into a list
myList <- mget(ls())

请注意,三个data.frames是我环境中唯一的对象。

# get the final data.frame
Reduce(merge, 
       Map(function(x, y) setNames(cbind(x[1], stack(x[-1])), c("Year", y, "ID")),
           myList, names(myList)))

返回

  Year ID performance price size
1 2001  A           1     1    1
2 2001  B           2     2    2
3 2001  C           4     4    4
4 2002  A           2     2    2
5 2002  B           3     3    3
6 2002  C           5     5    5
7 2003  A           3     3    3
8 2003  B           4     4    4
9 2003  C           6     6    6