我再次向您,伟大的stackoverflow社区寻求帮助。我不止一次地发布了这个问题,但不知何故,我无法解决这个简单明了的问题...老实说,我在这里感到有点沮丧,所以对所有帮助我的人都会感激不尽。
关于堆栈溢出的多个类似答案适用于我的小型可重现数据框架,但是当我在原始数据框架上使用相同的策略时,它无效。所以首先对变量进行一点翻译(它们是荷兰语):
Gemeente
== municipality jaar
==年Beleidscode
〜犯罪类别aantal_misdrijven
==犯罪次数Kennisnamedatum
==(日期)和weekdag
==工作日。 我的问题:
我想计算2017
与2015
和Gemeente
分组的Beleidscode
的变化。
library(tidyverse)
# This wil download my original data frame with ease:
df <- read_csv("https://github.com/thomasdebeus/colourful-facts/raw/master/projects/crime_dataset.csv")
# The following tries to first add a column with totals per
# year, municipality and crime category. Then calculate percentage change.
df %>%
group_by(Gemeente, jaar, Beleidscode) %>%
arrange(Gemeente, jaar, Beleidscode) %>%
summarise(per_jaar_Gem_misdrijf = sum(aantal_misdrijven)) %>%
mutate(perct_change = (per_jaar_gem_misdrijf - lag(per_jaar_gem_misdrijf, order_by = jaar)) / lag(per_jaar_gem_misdrijf, order_by = jaar))
ungroup()
所以是的,你可能会认为这不是创造正确的数字...... 我希望有人可以提供帮助。
答案 0 :(得分:0)
看起来您正在尝试计算上一年的百分比变化。你可以这样做的一种方法是
library(tidyverse)
# This wil download my original data frame with ease:
df <- read_csv("https://github.com/thomasdebeus/colourful-facts/raw/master/projects/crime_dataset.csv")
# Create a data frame with the summary count by year
dfSumByYear <-
df %>%
group_by(Gemeente, jaar, Beleidscode) %>%
summarise(per_jaar_Gem_misdrijf = sum(aantal_misdrijven)) %>%
ungroup()
# Add the Previous Year counts as an additional column
dfSumByYearWithPrev <-
dfSumByYear %>%
left_join(dfSumByYear %>%
mutate(JaarJoin = jaar+1) %>%
rename(per_PrevJaar_Gem_misdrijf = per_jaar_Gem_misdrijf) %>%
select(-jaar), by = c("Gemeente", c("jaar"="JaarJoin"), "Beleidscode")) %>%
# Calculate the Percentage Change
mutate(perct_change = (coalesce(per_jaar_Gem_misdrijf,0L) - coalesce(per_PrevJaar_Gem_misdrijf,0L)) / coalesce(per_PrevJaar_Gem_misdrijf,0L))
如果您想具体计算从2015年到2017年的变化,可以采用的一种方法是
library(tidyverse)
# This wil download my original data frame with ease:
df <- read_csv("https://github.com/thomasdebeus/colourful-facts/raw/master/projects/crime_dataset.csv")
# Calculate the percentage change from 2015 to 2017
df %>%
group_by(Gemeente, jaar, Beleidscode) %>%
summarise(per_jaar_Gem_misdrijf = sum(aantal_misdrijven)) %>%
ungroup() %>%
spread(jaar, per_jaar_Gem_misdrijf, fill = 0L) %>%
mutate(perct_change = (`2017` - `2015`) / `2015`)