我有两个数据帧
df1
Year Farm 1 Farm 2 Farm 3
2015 1000 2000 1500
2016 500 2000 1000
df 2
Year Month Farm 1 Farm 2 Farm 3
2015 Jan 1 1 3
2015 Feb 1 2 1
2016 Jan 2 2 2
2016 Feb 2 1 2
我想根据年份将df2中各个农场的df1中的年值相乘,以便输出为...
df 3
Year Month Farm 1 Farm 2 Farm 3
2015 Jan 1000 2000 4500
2015 Feb 1000 4000 1500
2016 Jan 1000 4000 2000
2016 Feb 1000 2000 2000
我已经正确格式化了年份,但是一直在dplyr中使用group_by寻求解决方案。我应该尝试其他方法吗?
答案 0 :(得分:1)
1)基数R 假设在结尾处的注释中可重复显示df1
和df2
,请合并给出数据帧m
的数据帧。然后通过用df3
的相同列和d2
的适当列的乘积替换df2
的除前两个列之外的所有列,创建一个新的数据帧m
。不使用任何软件包。
m <- merge(df2, df1, by = 1)
df3 <- replace(df2, -(1:2), df2[-(1:2)] * m[-(1:ncol(df2))] )
给予:
> df3
Year Month Farm1 Farm2 Farm3
1 2015 Jan 1000 2000 4500
2 2015 Feb 1000 4000 1500
3 2016 Jan 1000 4000 2000
4 2016 Feb 1000 2000 2000
2)sqldf 如果您只有几个服务器场,那么将每个服务器场都写出来是可行的,
library(sqldf)
sqldf("select
Year,
b.Month,
a.Farm1 * b.Farm1 Farm1,
a.Farm2 * b.Farm2 Farm2,
a.Farm3 * b.Farm3 Farm3
from df2 b left join df1 a using (Year)")
给予:
Year Month Farm1 Farm2 Farm3
1 2015 Jan 1000 2000 4500
2 2015 Feb 1000 4000 1500
3 2016 Jan 1000 4000 2000
4 2016 Feb 1000 2000 2000
Lines1 <- "
Year Farm1 Farm2 Farm3
2015 1000 2000 1500
2016 500 2000 1000"
Lines2 <- "
Year Month Farm1 Farm2 Farm3
2015 Jan 1 1 3
2015 Feb 1 2 1
2016 Jan 2 2 2
2016 Feb 2 1 2"
df1 <- read.table(text = Lines1, header = TRUE)
df2 <- read.table(text = Lines2, header = TRUE)
答案 1 :(得分:1)
这是来自data.table
的联接选项。将第二个数据集('df2')与第一个('df1')on
连接到'Year'列,然后将.SD
(基于{ {1}})和第一个数据中的相应列,分配(.SDcols
)输出以更新第二个数据集中的“ Farm”列
:=
答案 2 :(得分:0)
我将通过将数据帧转换为长格式,将它们合并然后进行计算来解决这个问题。这是一个示例:
# Load packages
library(dplyr)
library(tidyr)
# Make-up data
df1 = data.frame(Year = 2008:2018,
Farm1 = runif(n = 11, min = 0, max = 2000),
Farm2 = runif(n = 11, min = 0, max = 2000),
Farm3 = runif(n = 11, min = 0, max = 2000))
df2 = expand.grid(Year = 2008:2018,
Month = month.abb[1:12]) %>%
mutate(Farm1 = runif(n = 132, min = 0, max = 10),
Farm2 = runif(n = 132, min = 0, max = 10),
Farm3 = runif(n = 132, min = 0, max = 10))
# Transform data into long format
df1.long = df1 %>%
gather(key = Farm, value = AnnualValue, Farm1:Farm3)
df2.long = df2 %>%
gather(key = Farm, value = Value, Farm1:Farm3)
# Now left_join on Year and multiply columns
df.comb = left_join(df1.long, df2.long) %>%
mutate(NewValue = Value * AnnualValue)
# Transform back to wide format (if necessary)
df.comb.wide = df.comb %>%
select(-AnnualValue, -Value) %>% # drop values not included in wide format
spread(key = Farm, value = NewValue)