我试图将共享相同行/列名称但具有不同列数/行数的两个数据框加在一起,但是却很费劲。
创建数据框:
df1 = data_frame('Type' = 'Apple', '18/19' = 5)
df2 = data_frame('Type' = c('Apple', 'Pear', 'Banana'),
'16/17' = c(4,5,6), '17/18' = c(0,2,5), '18/19' = c(2,6,7))
df1:
Type 18/19
Apple 5
df2:
Type 16/17 17/18 18/19
Apple 4 0 2
Pear 5 2 6
Banana 6 5 7
我要结束的是这个:
dfFinal:
Type 16/17 17/18 18/19
Apple 4 0 7
Pear 5 2 6
Banana 6 5 7
我尝试过:
dfFinal = merge(df1, df2, all=TRUE)
但这只会创建两个“ Apple”行。
还有这个:
dfFinal = aggregate(.~Type,rbind(df1,setNames(df2,names(df1))),sum)
但这只是给我一个错误:“参数的列数不匹配”
dfFinal = cbind(df1[1], df1[-1] + df2[-1])
给我一个错误“仅为相等大小的数据帧定义了+”
dfFinal = merge(data.frame(df1, row.names=NULL), data.frame(df2, row.names=NULL),
by = 0, all = TRUE)[-1]
将“类型”列分为两部分。
有什么建议么?这应该很容易,但是我无法使其正常工作。
答案 0 :(得分:2)
我猜是这样吗?我不确定是否要根据df2中的顺序订购Type。
library(dplyr)
library(tibble)
merge(df1, df2, all=TRUE) %>% group_by(Type) %>% summarise_all(sum,na.rm=TRUE)
# A tibble: 3 x 4
Type `18/19` `16/17` `17/18`
<chr> <dbl> <dbl> <dbl>
1 Apple 7 4 0
2 Banana 7 6 5
3 Pear 6 5 2
如果需要,则必须这样做
rowlvl <- df2$Type
collvl <- colnames(df2)
merge(df1, df2, all=TRUE) %>% select(collvl) %>% mutate(Type=factor(Type,levels=rowlvl)) %>%
group_by(Type) %>% summarise_all(sum,na.rm=TRUE)
# A tibble: 3 x 4
Type `16/17` `17/18` `18/19`
<fct> <dbl> <dbl> <dbl>
1 Apple 4 0 7
2 Pear 5 2 6
3 Banana 6 5 7
答案 1 :(得分:1)
如果将这些数据从宽转换为长然后合并,答案可能会容易得多。
此解决方案要求您安装tidyr版本1。
library(tidyr)
library(dplyr)
df1 <- data_frame("Type" = "Apple", "18/19" = 5)
df2 <- data_frame(
"Type" = c("Apple", "Pear", "Banana"),
"16/17" = c(4, 5, 6), "17/18" = c(0, 2, 5), "18/19" = c(2, 6, 7)
)
df_final <- bind_rows(
df1 %>%
# pivoting to make the shapes of both data frames the same
pivot_longer(
cols = -Type,
names_to = "years",
values_to = "count"
),
df2 %>%
# pivoting to make the shapes of both data frames the same
pivot_longer(
cols = -Type,
names_to = "years",
values_to = "count"
)
) %>%
group_by(Type, years) %>%
summarise(count = sum(count)) %>%
# pivot again to convert back to wide format as answer required
pivot_wider(
names_from = years,
values_from = count
)