我有一个如下的数据集:
CRASH CRASH_DATE geoid CRASH_TIME
41259861 2015-12-24 2502312044025 1056
41243891 2015-12-19 2502312044025 559
41243791 2015-12-17 2502312044025 1436
41256041 2015-12-22 2502312044007 1647
41255881 2015-12-17 2502312044007 2022
...
我最终的输出数据帧就像:
geoid average_per_week variance_per_week
2502312044025 x t
2502312044007 y v
...
我想在某个区域内更改和平均每周崩溃次数,我的第一次尝试如下:
aggregate(Crash[["geoid"]],by=list(week(Crash[["CRASH_DATE"]])),mean)
但是会引发错误。
答案 0 :(得分:0)
library(dplyr); library(lubridate)
options(scipen = 99) # To display geoid w/o scientific notation.
# Step 0. Load data
df <- read.table(header = T, stringsAsFactors = F, text = "
CRASH CRASH_DATE geoid CRASH_TIME
41259861 2015-12-24 2502312044025 1056
41243891 2015-12-19 2502312044025 559
41243791 2015-12-17 2502312044025 1436
41256041 2015-12-22 2502312044007 1647
41255881 2015-12-17 2502312044007 2022") %>%
# Step 1. Count incidents by geoid and week
group_by(geoid, week = floor_date(ymd(CRASH_DATE), "1 week")) %>%
tally() %>%
# Step 2. Calc avg and variance. Note, if there are gaps in between incidents
# and you want to use a common time span, you might want to add
# padr::pad() here, with start_val etc.
summarize(avg = mean(n), variance = var(n))
> df
# A tibble: 2 x 3
geoid avg variance
<dbl> <dbl> <dbl>
1 2502312044007 1 0
2 2502312044025 1.5 0.5