我对R的使用经验很少。我不确定是要模仿Excel还是有更好的方法进行简单的Excel单元格减法,所以我不知道如何在R中进行以下计算。
我在R中有以下数据。
year marketplace bridged_on value
01/01/2018 US A 1,710,103,328
01/01/2018 US B 1,710,103,328
01/01/2018 US C 1,710,103,328
01/01/2018 US D 1,710,103,328
01/01/2019 US A 1,669,210,438
01/01/2019 US B 1,653,940,292
01/01/2019 US C 1,624,487,359
01/01/2019 US D 1,617,335,174
01/01/2020 US A 1,674,636,402
01/01/2020 US B 1,647,437,876
01/01/2020 US C 1,601,234,000
01/01/2020 US D 1,591,107,584
我需要计算逐年变化,并在Excel中,创建一个以年为列的数据透视表,然后在单元格中应用减法公式。
这是Excel中计算所得的屏幕截图。我正在计算A与B,B与C,C与D之间的差额,然后从上一年中减去相同的差额。例如,H6中的计算为(C6-C7)-(D6-D7)。
我不确定如何在R中重现相同的计算,并在R中将G5至H8作为输出。
答案 0 :(得分:0)
library(dplyr)
library(stringr)
library(purrr)
library(lubridate)
library(readr)
library(reshape2)
data <- read_delim("year marketplace bridged_on value
01/01/2018 US A 1,710,103,328
01/01/2018 US B 1,710,103,328
01/01/2018 US C 1,710,103,328
01/01/2018 US D 1,710,103,328
01/01/2019 US A 1,669,210,438
01/01/2019 US B 1,653,940,292
01/01/2019 US C 1,624,487,359
01/01/2019 US D 1,617,335,174
01/01/2020 US A 1,674,636,402
01/01/2020 US B 1,647,437,876
01/01/2020 US C 1,601,234,000
01/01/2020 US D 1,591,107,584 ",delim = " ")
colnames(data) <- str_trim(colnames(data))
data <- map_dfc(data,str_trim)
data <- data %>%
mutate(year= mdy(year),
value = parse_number(value))
#display cleaned data
> data
# A tibble: 12 x 4
year marketplace bridged_on value
<date> <chr> <chr> <dbl>
1 2018-01-01 US A 1710103328
2 2018-01-01 US B 1710103328
3 2018-01-01 US C 1710103328
4 2018-01-01 US D 1710103328
5 2019-01-01 US A 1669210438
6 2019-01-01 US B 1653940292
7 2019-01-01 US C 1624487359
8 2019-01-01 US D 1617335174
9 2020-01-01 US A 1674636402
10 2020-01-01 US B 1647437876
11 2020-01-01 US C 1601234000
12 2020-01-01 US D 1591107584
我相信您在第8行的计算是错误的。您正在根据提供的公式使用总计进行计算。
要在R中执行此操作,您需要以长格式构造数据帧,并使用dplyr::lag()
计算不同年份之间的差额。最后,您需要使用reshape2::dcast()
从长格式转换为宽格式。
您可以分解管道,看看每个步骤的中间结果是什么。
result <- data %>%
mutate(year = year(year)) %>%
group_by(bridged_on) %>%
mutate(annual_diff = value - lag(value)) %>%
ungroup() %>%
dplyr::filter(!is.na(annual_diff)) %>%
group_by(year) %>%
mutate(annual_diff2 = annual_diff - lag(annual_diff)) %>%
dplyr::filter(!is.na(annual_diff2)) %>%
select(year,bridged_on,annual_diff2) %>%
ungroup() %>%
dcast(bridged_on ~ year)
>result
bridged_on 2019 2020
1 B -15270146 -11928380
2 C -29452933 -16750943
3 D -7152185 -2974231