在R中将基于年的向量与基于年和月的矩阵相乘

时间:2018-10-13 21:51:05

标签: r sorting date split dplyr

我有两个数据帧

df1 

Year  Farm 1  Farm 2  Farm 3
2015    1000    2000    1500
2016    500     2000    1000

df 2

Year Month  Farm 1 Farm 2 Farm 3
2015  Jan    1        1      3
2015  Feb    1        2      1
2016  Jan    2        2      2
2016  Feb    2        1      2

我想根据年份将df2中各个农场的df1中的年值相乘,以便输出为...

df 3 

Year    Month   Farm 1      Farm 2      Farm 3
2015    Jan     1000        2000        4500
2015    Feb     1000        4000        1500
2016    Jan     1000        4000        2000
2016    Feb     1000        2000        2000

我已经正确格式化了年份,但是一直在dplyr中使用group_by寻求解决方案。我应该尝试其他方法吗?

3 个答案:

答案 0 :(得分:1)

1)基数R 假设在结尾处的注释中可重复显示df1df2,请合并给出数据帧m的数据帧。然后通过用df3的相同列和d2的适当列的乘积替换df2的除前两个列之外的所有列,创建一个新的数据帧m。不使用任何软件包。

m <- merge(df2, df1, by = 1)
df3 <- replace(df2, -(1:2), df2[-(1:2)] * m[-(1:ncol(df2))] )

给予:

> df3
  Year Month Farm1 Farm2 Farm3
1 2015   Jan  1000  2000  4500
2 2015   Feb  1000  4000  1500
3 2016   Jan  1000  4000  2000
4 2016   Feb  1000  2000  2000

2)sqldf 如果您只有几个服务器场,那么将每个服务器场都写出来是可行的,

library(sqldf)

sqldf("select 
         Year, 
         b.Month, 
         a.Farm1 * b.Farm1 Farm1,
         a.Farm2 * b.Farm2 Farm2,
         a.Farm3 * b.Farm3 Farm3
       from df2 b left join df1 a using (Year)")

给予:

  Year Month Farm1 Farm2 Farm3
1 2015   Jan  1000  2000  4500
2 2015   Feb  1000  4000  1500
3 2016   Jan  1000  4000  2000
4 2016   Feb  1000  2000  2000

注意

Lines1 <- "
Year  Farm1  Farm2  Farm3
2015    1000    2000    1500
2016    500     2000    1000"

Lines2 <- "
Year Month  Farm1 Farm2 Farm3
2015  Jan    1        1      3
2015  Feb    1        2      1
2016  Jan    2        2      2
2016  Feb    2        1      2"

df1 <- read.table(text = Lines1, header = TRUE)
df2 <- read.table(text = Lines2, header = TRUE)

答案 1 :(得分:1)

这是来自data.table的联接选项。将第二个数据集('df2')与第一个('df1')on连接到'Year'列,然后将.SD(基于{ {1}})和第一个数据中的相应列,分配(.SDcols)输出以更新第二个数据集中的“ Farm”列

:=

答案 2 :(得分:0)

我将通过将数据帧转换为长格式,将它们合并然后进行计算来解决这个问题。这是一个示例:

# Load packages
library(dplyr)
library(tidyr)

# Make-up data
df1 = data.frame(Year = 2008:2018,
                 Farm1 = runif(n = 11, min = 0, max = 2000),
                 Farm2 = runif(n = 11, min = 0, max = 2000),
                 Farm3 = runif(n = 11, min = 0, max = 2000))

df2 = expand.grid(Year = 2008:2018,
                  Month = month.abb[1:12]) %>% 
  mutate(Farm1 = runif(n = 132, min = 0, max = 10),
         Farm2 = runif(n = 132, min = 0, max = 10),
         Farm3 = runif(n = 132, min = 0, max = 10))

# Transform data into long format
df1.long = df1 %>%
  gather(key = Farm, value = AnnualValue, Farm1:Farm3)

df2.long = df2 %>%
  gather(key = Farm, value = Value, Farm1:Farm3)

# Now left_join on Year and multiply columns
df.comb = left_join(df1.long, df2.long) %>% 
  mutate(NewValue = Value * AnnualValue)

# Transform back to wide format (if necessary)
df.comb.wide = df.comb %>% 
  select(-AnnualValue, -Value) %>% # drop values not included in wide format
  spread(key = Farm, value = NewValue)