当我有像这样的虚拟示例那样整洁的数据时:
structure(list(year = c(2017L, 2018L, 2019L, 2020L, 2017L, 2018L,
2019L, 2020L), figure = c("income", "income", "income", "income",
"expenses", "expenses", "expenses", "expenses"), value = c(10,
11, 10, 13, 5, 4, 4, 4)), row.names = c(NA, -8L), .Names = c("year",
"figure", "value"), class = "data.frame")
即:
year figure value
1 2017 income 10
2 2018 income 11
3 2019 income 10
4 2020 income 13
5 2017 expenses 5
6 2018 expenses 4
7 2019 expenses 4
8 2020 expenses 4
我想计算每年的利润(收入-支出),我使用以下方法:
temp %>%
spread(figure, value) %>%
mutate(profit = income - expenses) %>%
gather(figure, value, -year)
,输出为:
year figure value
1 2017 expenses 5
2 2018 expenses 4
3 2019 expenses 4
4 2020 expenses 4
5 2017 income 10
6 2018 income 11
7 2019 income 10
8 2020 income 13
9 2017 profit 5
10 2018 profit 7
11 2019 profit 6
12 2020 profit 9
我将表更改为宽格式,在列之间进行操作,然后再次将数据更改为长格式。
group_by()
是否可以执行相同的操作,但又不会更改为宽格式,然后又更改为长格式?
编辑:
我有以下data.frame:
temp <- structure(list(year = c(2017L, 2018L, 2019L, 2020L, 2017L, 2018L,
2019L, 2020L, 2017L, 2018L, 2019L, 2020L, 2017L, 2018L, 2019L,
2020L), figure = c("income", "income", "income", "income", "expenses",
"expenses", "expenses", "expenses", "income", "income", "income",
"income", "expenses", "expenses", "expenses", "expenses"), value = c(10,
11, 10, 13, 5, 4, 4, 4, 10, 11, 10, 13, 5, 4, 4, 4), company = c("A",
"A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B",
"B", "B")), .Names = c("year", "figure", "value", "company"), row.names = c(NA,
-16L), class = "data.frame")
我这样做:
temp %>%
filter(company == "A") %>%
group_by(year, company) %>%
summarise(value = value[figure == 'income'] - value[figure == 'expenses'],
figure = 'profit') %>%
bind_rows(temp, .)
最终输出包含公司“ A”和公司“ B”,并且输出只能是“ B”。该示例表明,如果我们在做摘要之前修改数据,则与原始data.frame绑定并不是一个好主意。
答案 0 :(得分:1)
对于每个year
,您可以用"income"
值减去value
"expenses"
并将结果绑定到原始数据帧。
library(dplyr)
df %>%
group_by(year) %>%
summarise(value = value[figure == 'income'] - value[figure == 'expenses'],
figure = 'profit') %>%
bind_rows(df, .)
# year figure value
#1 2017 income 10
#2 2018 income 11
#3 2019 income 10
#4 2020 income 13
#5 2017 expenses 5
#6 2018 expenses 4
#7 2019 expenses 4
#8 2020 expenses 4
#9 2017 profit 5
#10 2018 profit 7
#11 2019 profit 6
#12 2020 profit 9
我们还可以使用diff
通过year
和figure
整理数据后减去值。
df %>%
arrange(year, figure) %>%
group_by(year) %>%
summarise(value = diff(value),
figure = 'profit') %>%
bind_rows(df, .)