按组和多列运行差异

时间:2017-12-01 16:25:35

标签: r dplyr

我有一个如下数据框:

Date <- as.Date(c('2017-10-16',
                  '2017-10-16',
                  '2017-10-17',
                  '2017-10-17',
                  '2017-10-18',
                  '2017-10-18',
                  '2017-10-19',
                  '2017-10-19',
                  '2017-10-20',
                  '2017-10-20'))

Source <- as.Date(c('2017-11-29',
                    '2017-11-30',
                    '2017-11-29',
                    '2017-11-30',
                    '2017-11-29',
                    '2017-11-30',
                    '2017-11-29',
                    '2017-11-30',
                    '2017-11-29',
                    '2017-11-30'))

Revenue <- c(206.88,
             210.88,
             194.13,
             200.13,
             170.00,
             170.00,
             746.65,
             736.65,
             772.00,
             772.00)

Cost <- c(100.88,
           10.88,
           85.13,
          100.13,
          170.00,
          100.00,
           46.65,
           50.65,
           23.00,
           24.00)

df <- data.frame(Date, Source, Revenue, Cost)

数据帧:

df
         Date         Source     Revenue       Cost
1  2017-10-16     2017-11-29      206.88     100.88
2  2017-10-16     2017-11-30      210.88      10.88
3  2017-10-17     2017-11-29      194.13      85.13
4  2017-10-17     2017-11-30      200.13     100.13
5  2017-10-18     2017-11-29      170.00     170.00
6  2017-10-18     2017-11-30      170.00     100.00
7  2017-10-19     2017-11-29      746.65      46.65
8  2017-10-19     2017-11-30      736.65      50.65
9  2017-10-20     2017-11-29      772.00      23.00
10 2017-10-20     2017-11-30      772.00      24.00

如何按日期计算运行差异,但是在第二列之后为每列执行此操作?

最终结果需要如下所示:

         Date         Source     Revenue       Cost    Revenue_Diff     Cost_Diff    .....................
1  2017-10-16     2017-11-29      206.88     100.88          NA           NA      .....................
2  2017-10-16     2017-11-30      210.88      10.88           4          -90      .....................
3  2017-10-17     2017-11-29      194.13      85.13          NA           NA     .....................
4  2017-10-17     2017-11-30      200.13     100.13           6           15   .....................
5  2017-10-18     2017-11-29      170.00     170.00          NA           NA   .....................
6  2017-10-18     2017-11-30      170.00     100.00           0          -70   .....................
7  2017-10-19     2017-11-29      746.65      46.65          NA           NA   .....................
8  2017-10-19     2017-11-30      736.65      50.65         -10            4   .....................
9  2017-10-20     2017-11-29      772.00      23.00          NA           NA   .....................
10 2017-10-20     2017-11-30      772.00      24.00           0            1   .....................

我当前的脚本一次只能执行1列但我希望能够执行Source右侧的所有操作:

test<-df %>%
  group_by(Date) %>%
  mutate(Revenue_Diff = c(NA, diff(`Revenue`)))

任何帮助都会很棒,谢谢!

1 个答案:

答案 0 :(得分:1)

使用dplyr的解决方案。我们可以使用mutate_at指定要执行操作的列。 lag可以更改计算值的位置。

library(dplyr)

df2 <- df %>%
  group_by(Date) %>%
  mutate_at(vars(-Source), funs(Diff = . - lag(.))) %>%
  ungroup()
df2

# # A tibble: 10 x 6
#          Date     Source Revenue   Cost Revenue_Diff Cost_Diff
#        <date>     <date>   <dbl>  <dbl>        <dbl>     <dbl>
#  1 2017-10-16 2017-11-29  206.88 100.88           NA        NA
#  2 2017-10-16 2017-11-30  210.88  10.88            4       -90
#  3 2017-10-17 2017-11-29  194.13  85.13           NA        NA
#  4 2017-10-17 2017-11-30  200.13 100.13            6        15
#  5 2017-10-18 2017-11-29  170.00 170.00           NA        NA
#  6 2017-10-18 2017-11-30  170.00 100.00            0       -70
#  7 2017-10-19 2017-11-29  746.65  46.65           NA        NA
#  8 2017-10-19 2017-11-30  736.65  50.65          -10         4
#  9 2017-10-20 2017-11-29  772.00  23.00           NA        NA
# 10 2017-10-20 2017-11-30  772.00  24.00            0         1