R区域中每周平均崩溃次数

时间:2018-10-03 19:39:43

标签: r dataframe aggregation

我有一个如下的数据集:

CRASH       CRASH_DATE  geoid          CRASH_TIME
41259861    2015-12-24  2502312044025   1056
41243891    2015-12-19  2502312044025   559
41243791    2015-12-17  2502312044025   1436
41256041    2015-12-22  2502312044007   1647
41255881    2015-12-17  2502312044007   2022
...

我最终的输出数据帧就像:

    geoid           average_per_week   variance_per_week
    2502312044025       x                 t
    2502312044007       y                 v
...

我想在某个区域内更改和平均每周崩溃次数,我的第一次尝试如下:

aggregate(Crash[["geoid"]],by=list(week(Crash[["CRASH_DATE"]])),mean)

但是会引发错误。

1 个答案:

答案 0 :(得分:0)

library(dplyr); library(lubridate)
options(scipen = 99) # To display geoid w/o scientific notation.

# Step 0. Load data
df <- read.table(header = T, stringsAsFactors = F,  text = "
CRASH       CRASH_DATE  geoid          CRASH_TIME
41259861    2015-12-24  2502312044025   1056
41243891    2015-12-19  2502312044025   559
41243791    2015-12-17  2502312044025   1436
41256041    2015-12-22  2502312044007   1647
41255881    2015-12-17  2502312044007   2022") %>%

# Step 1. Count incidents by geoid and week
  group_by(geoid, week = floor_date(ymd(CRASH_DATE), "1 week")) %>%
  tally() %>%

# Step 2. Calc avg and variance. Note, if there are gaps in between incidents
# and you want to use a common time span, you might want to add
# padr::pad() here, with start_val etc.
  summarize(avg = mean(n), variance = var(n))


> df
# A tibble: 2 x 3
          geoid   avg variance
          <dbl> <dbl>    <dbl>
1 2502312044007   1        0  
2 2502312044025   1.5      0.5