合并两个具有不同行数和列数的数据框

时间:2019-11-18 11:08:00

标签: r

我试图将共享相同行/列名称但具有不同列数/行数的两个数据框加在一起,但是却很费劲。

创建数据框:

df1 = data_frame('Type' = 'Apple', '18/19' = 5)
df2 = data_frame('Type' = c('Apple', 'Pear', 'Banana'), 
                 '16/17' = c(4,5,6), '17/18' = c(0,2,5), '18/19' = c(2,6,7))


df1:

Type    18/19
Apple   5


  df2:

Type    16/17   17/18   18/19
Apple      4       0       2
Pear       5       2       6
Banana     6       5       7



我要结束的是这个:

dfFinal:

Type    16/17   17/18   18/19
Apple      4       0       7
Pear       5       2       6
Banana     6       5       7

我尝试过:

dfFinal = merge(df1, df2, all=TRUE)

但这只会创建两个“ Apple”行。

还有这个:

dfFinal = aggregate(.~Type,rbind(df1,setNames(df2,names(df1))),sum)

但这只是给我一个错误:“参数的列数不匹配”

dfFinal = cbind(df1[1], df1[-1] + df2[-1])

给我一​​个错误“仅为相等大小的数据帧定义了+”

dfFinal = merge(data.frame(df1, row.names=NULL), data.frame(df2, row.names=NULL), 
                by = 0, all = TRUE)[-1]

将“类型”列分为两部分。



有什么建议么?这应该很容易,但是我无法使其正常工作。

2 个答案:

答案 0 :(得分:2)

我猜是这样吗?我不确定是否要根据df2中的顺序订购Type。

library(dplyr)
library(tibble)
merge(df1, df2, all=TRUE) %>% group_by(Type) %>% summarise_all(sum,na.rm=TRUE)
# A tibble: 3 x 4
  Type   `18/19` `16/17` `17/18`
  <chr>    <dbl>   <dbl>   <dbl>
1 Apple        7       4       0
2 Banana       7       6       5
3 Pear         6       5       2

如果需要,则必须这样做

rowlvl <- df2$Type
collvl <- colnames(df2)
merge(df1, df2, all=TRUE) %>% select(collvl) %>% mutate(Type=factor(Type,levels=rowlvl)) %>%
group_by(Type) %>% summarise_all(sum,na.rm=TRUE)

# A tibble: 3 x 4
  Type   `16/17` `17/18` `18/19`
  <fct>    <dbl>   <dbl>   <dbl>
1 Apple        4       0       7
2 Pear         5       2       6
3 Banana       6       5       7

答案 1 :(得分:1)

如果将这些数据从宽转换为长然后合并,答案可能会容易得多。

此解决方案要求您安装tidyr版本1。


library(tidyr)
library(dplyr)

df1 <- data_frame("Type" = "Apple", "18/19" = 5)
df2 <- data_frame(
  "Type" = c("Apple", "Pear", "Banana"),
  "16/17" = c(4, 5, 6), "17/18" = c(0, 2, 5), "18/19" = c(2, 6, 7)
)


df_final <- bind_rows(
  df1 %>%
    # pivoting to make the shapes of both data frames the same
    pivot_longer(
      cols = -Type,
      names_to = "years",
      values_to = "count"
    ),
  df2 %>%
    # pivoting to make the shapes of both data frames the same
    pivot_longer(
      cols = -Type,
      names_to = "years",
      values_to = "count"
    )
) %>%
  group_by(Type, years) %>%
  summarise(count = sum(count)) %>%
  # pivot again to convert back to wide format as answer required
  pivot_wider(
    names_from = years,
    values_from = count
  )